*Article* **Exploring the Predictors of Co-Nationals' Preference over Immigrants in Accessing Jobs—Evidence from World Values Survey**

**Daniel Homocianu**

Department of Accounting, Business Information Systems, and Statistics, Faculty of Economics and Business Administration, Alexandru Ioan Cuza University, 700505 Jassy, Romania; daniel.homocianu@uaic.ro

**Abstract:** This paper presents the results of an exploration of the most resilient influences determining the attitude regarding prioritizing co-nationals over immigrants for access to employment. The source data were from the World Values Survey. After many selection and testing steps, a set of the seven most significant determinants was produced (a fair-to-good model as prediction accuracy). These seven determinants (a hepta-core model) correspond to some features, beliefs, and attitudes regarding emancipative values, gender discrimination, immigrant policy, trust in people of another nationality, inverse devoutness or making parents proud as a life goal, attitude towards work, the post-materialist index, and job preferences as more inclined towards self rather than community benefits. Additional controls revealed the significant influence of some socio-demographic variables. They correspond to gender, the number of children, the highest education level attained, employment status, income scale positioning, settlement size, and the interview year. All selection and testing steps considered many principles, methods, and techniques (e.g., triangulation via adaptive boosting (in the Rattle library of R), and pairwise correlation-based data mining—PCDM, LASSO, OLS, binary and ordered logistic regressions (LOGIT, OLOGIT), prediction nomograms, together with tools for reporting default and custom model evaluation metrics, such as ESTOUT and MEM in Stata). Cross-validations relied on random subsamples (CVLASSO) and well-established ones (mixed-effects). In addition, overfitting removal (RLASSO), reverse causality, and collinearity checks succeeded under full conditions for replicating the results. The prediction nomogram corresponding to the most resistant predictors identified in this paper is also a powerful tool for identifying risks. Therefore, it can provide strong support for decision makers in matters related to immigration and access to employment. The paper's novelty also results from the many robust supporting techniques that allow randomly, and non-randomly cross-validated and fully reproducible results based on a large amount and variety of source data. The findings also represent a step forward in migration and access-to-job research.

**Keywords:** immigration; access to employment; regression and classification models; collinearity and reverse causality checks; performance comparisons and reporting; triangulation; cross-validations; full support for replication of results

**MSC:** 60-02

#### **1. Introduction**

A well-known saying by Andrew Smith states: "People fear what they don't understand and hate what they can't conquer". Migration is a generalized phenomenon as old as humanity [1]. Moreover, it seems to belong to all historical periods and all continents. Consequently, it became an issue of growing public concern [2]. In today's highly globalized and knowledge-based economies [3], migration is responsible for affecting individuals and societies multi-dimensionally [4]. According to Kanbur and Rapoport (2005) [5], its effects apply to both countries of origin and destination, and some of them relate to brain drain and widening income gaps [6].

**Citation:** Homocianu, D. Exploring the Predictors of Co-Nationals' Preference over Immigrants in Accessing Jobs—Evidence from World Values Survey. *Mathematics* **2023**, *11*, 786. https://doi.org/ 10.3390/math11030786

Academic Editors: Alexandru Agapie, Denis Enachescu, Vlad Stefan Barbu and Bogdan Iftimie

Received: 21 December 2022 Revised: 30 January 2023 Accepted: 1 February 2023 Published: 3 February 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In terms of migration motivations, the search for jobs [7,8] is one of them and the basis for the hope of a stable [9], if not better, life [10]. The latter seems natural to human beings [11]. Sensitivity to immigration, a process that affects both the immigrants and the native population [12], depends significantly on the country under consideration [13]. A well-known example of negative public perception is related to the concern that immigrants take the jobs of native-born workers [14–16]. Additionally, this will be translated into negative feelings of native residents towards immigrants and even less supportive attitudes towards pro-immigration policies [17], more as an expression of fear. These labor-marketrelated concerns [18] considered together with some other economic worries, such as the competition for economic and political power, social status, and the concern for crimes affecting individual security and material welfare form a large category known as realistic threats [19], the latter perhaps is even an expression of hatred.

In the same category of realistic threats (many of macroeconomic nature), we can find another explanation for negative perceptions of immigrants. This explanation seems to be related to the competition for limited resources [20–22] as a primary source of the conflict of interests between groups [23], mainly focused on cost–benefit reasons coupled with some other considerations such as geographical disproportions [24].

Other studies are more focused on socio-demographic and individual features. They show that women and those with higher education and income were more positive toward immigration, whereas older people and people with more seniority at work were considerably more negative [25]. The latter is confirmed in studies focused on comparing young people with adults in such specific terms [26]. Still, recent studies indicate that younger generations may, in fact, harbor more negative attitudes towards immigrants [27]. In addition, people who subscribe to conservative political ideologies are more likely to show negative attitudes toward immigrants [28]. Moreover, some personality traits, such as social domination orientation and right-wing authoritarianism, which reflect attitudes toward social hierarchy, equality, respect for authority, and traditional values, can condition individual perceptions of immigrants as inferior or even a threat [29,30].

Regarding another category of threats, namely the symbolic ones, Mangum and Block (2018) [31] consider that social identity affects public opinion on immigration and immigrants. In these terms, cultural differences coupled with the size of the minority group can act as threats to the values and identity of the majority [32]. Closely related to individual traits, other scholars [33] have shown that more educated people place a much higher value on the cultural diversity of society, believing that immigration generates benefits for society. The latter suggests that education is a transformative force capable of changing individual and collective values, and also encouraging people to be more confident, tolerant, and open [34].

Therefore, in addition to apparent reasons such as fear or hatred, attitudes towards immigrants and their access to jobs depend to a large extent on a whole range of more complex reasons related to individual and group characteristics, including personality traits, age, level of education, values and attitudes transmitted and developed, cultural diversity, and policies related to these phenomena. And this, of course, without claiming that this list is exhaustive.

The article further reviews the literature on the perceptions related to both migration and migrants as potential occupants of jobs. Then, it describes the data and methodology used, before presenting and discussing the main findings in a dedicated section. The latter captures the focus of the current study, namely the discovery of the determinants of the public perception's preference for citizens over immigrants regarding access to jobs. Additionally, this is achieved by insisting on emphasizing causal relations and eliminating redundancies after performing many robustness checks in advance.

#### **2. Related Work**

According to Ambrosini (2013) [35], at a certain point, many local governments developed a policy of excluding immigrants, motivated by reasons of security, the priority of national citizens' access to various social benefits, and the defense of the cultural identity of the territory. Additionally, the opposite could work here, which means that such policies inevitably generate some perceptions [36] and indirectly change the public perception of immigrants. In some cases, they can destabilize the moral panics nurtured by it [37]. However, the relationship between the two exists and was a source of some debates and discussions in the literature [2,38,39]. Ivarsflaten (2005) [40] even compared the impact that some elites exert, which has the potential to impact change in the public perception that diversity poses a threat. This author concluded that the former would undoubtedly be less significant.

Regarding the Big Five personality traits and their potential impact on immigration acceptance, Rueda (2018) [41] stated that altruism is an important omitted variable in many political economy studies, which focuses on self-interest rather than on aversion to inequality. Stafford (2020) [42] examined the relationship between attitudes towards immigration and the Big Five personality traits. She found that personality traits, especially those related to altruism, are not just simple influences but essential determinants of attitudes toward immigrants, even with controls for political predispositions and sociodemographic characteristics.

Kunst et al. (2015) [43] discuss the common identity notion, which seems to be crucial for securing the altruistic efforts of the majority to integrate immigrants and, thus, for achieving functional multiculturalism. Still, some research on multicultural beliefs [44] has shown that multiculturalism can cause negative reactions against immigrants and minority groups. This is because the members of the majority sometimes perceive it as threatening their position and identity [45]. Moreover, other studies [46,47] suggest a strong relation between immigration acceptance and emancipative and democratic values. The latter is not necessarily incompatible with the idea of multiculturalism [48]. On the other hand, the perceived high discrimination and lack of acceptance hinder the positive impact of any integration guidelines [49].

In terms of interpersonal trust, according to Pellegrini et al. (2021) [50], this is a mediator between the experienced social exclusion and anti-immigrant attitudes. The experience of being socially excluded reduces feelings of generalized interpersonal trust that, in turn, promote hostile attitudes towards immigrants. Rustenbach (2010) [51] found this type of trust to be a strong predictor of anti-immigrant attitudes.

According to Ensign and Robinson (2011) [52], conventional thinking suggests that immigrants have no choice but to work as entrepreneurs or be self-employed, which is somehow to the detriment of the idea that entrepreneurial attitudes make them migrate. Moreover, it is worth mentioning that employers assign particular meanings to the migrant identity [53], which allows them to enjoy the benefits of cheap, exploitable, and hardworking employees. In some cases, migrants use this identity to obtain jobs, enduring exploitation, including the peculiar form of working below their skill level. Still, accepting hard work at lower wages [54] is explained by the dreams of future self-employment of the immigrant workers.

Therefore, considering the arguments presented here and in the Introduction section, the main hypotheses of this paper are:

**H1.** *The opinion on immigration policy is closely related to or even a determinant of the level of public acceptance of immigrants as potential job occupants [35,55]*.

**H2.** *Those who subscribe to altruism [56], including working in the benefit of large communities, emancipative values [57], and against any discrimination no matter the type [58], ideologies including multiculturalism [59], and trust in people no matter their origins, are more inclined to accept immigrants when it comes to access to jobs*.

**H3.** *The ones being more attached to their cultural values and traditions [60] as part of their national identity [61–63] are more likely to be against immigrants as potential job occupants*.

**H4.** *The attitude towards work and entrepreneurship (as an expression of independence) could be a determinant for this specific type of immigrant acceptance [64–66]*.

**H5.** *The respondent's socio-demographic features are also significant predictors for this kind of acceptance [67,68]*.

#### **3. Materials and Methods**

This article started from one of the most comprehensive datasets of the World Values Survey (WVS). The latter (version 1.6, WVS\_TimeSeries\_stata\_v1\_6.dta) includes 1045 variables and 426,452 observations. Its .csv export followed the simple binary derivation (C002bin) of the original variable to analyze (C002, Jobs scarce: Employers should prioritize nation people than immigrants). Additionally, this was achieved by considering the two extremes of its original three-point scale (Agree, Disagree, Neither—Tables A1 and A2, Appendix A). The option to generate numerical values for labeled variables was enabled when exporting.

The next step was to load this .csv export into the Rattle data mining interface (version 5.4.0) of R, then set C002bin as the target, ignore its source (C002) from the list of inputs and apply the adaptive boosting technique for the decision tree classifiers [69]. This step was performed [70,71] using default settings (Figure 1) to discover the most important related variables. The latter was the 1st data mining and selection round.


A consolidation of the set of variables used followed. It involved the ones remaining after the previous step. In some cases, such as with aggregate indexes, it included their sources.

The 2nd selection round stood on a set of filters applied. First, they met a minimum threshold of 0.1 [72] for the absolute values of pairwise correlation coefficients [73] between each recoded variable from the previous step and the one that was to be analyzed. In addition, there was a minimum value of the corresponding significance (min *p* = 0.001) and a minimum support afferent to a minimum number of valid observations (at least a third of the total number) for each pair.

A processing/recoding phase followed. It involved all remaining variables (after the 2nd selection phase). Additionally, some socio-demographic ones for control and crossvalidations purposes benefited this treatment. It mostly meant removing the missing and DK/NA (do not know/no answer) values [74] and reversing the scales in the case of larger values which do not reflect higher intensities, but vice versa.

Next, the 3rd selection phase occurred using mixed-effects modeling [75–77] in Stata 17 MP (64-bit version). The latter included both fixed-effects (the remaining variables after the 2nd selection phase and recoded at the previous step—top of Table A1, Appendix A) and random effects (clusters on gender, age, marital status, number of children, education level, income level, professional situation, region, settlement size, and survey year—bottom of Tables A1 and A2, Appendix A). Only those variables not losing significance no matter the clustering criteria and the mixed-effects regression type (both the melogit for the binary form of the response variable and the meologit for the one having values on a scale) resulted in this selection point.

Next, the 4th selection round took place also in Stata. It consisted of successive invocations (stages) of two powerful commands in the LASSO [78] package (CVLASSO to perform random cross-validations and RLASSO for controlling overfitting) until there was no loss in selections.

At the next step (5th round), reverse causality checks served the selection. The latter meant using pairs of individual models built by taking only each of the remaining influences and the variable to analyze (wished roles) and by reversing their roles (the response becomes an input and vice versa or reversed roles). Only some resulted after using ordered logit regressions. It is about the ones generating more explanatory power [79]/larger R-squared (or pseudo R-squared in the form of McFadden's R-squared as reported by Stata for non-OLS regressions such as logit, ologit, meologit, etc.—explanations by Professor Richard Williams of the University of Notre Dame, https://www3.nd.edu/~rwilliam/stats3/L05.pdf (accessed on 25 January 2023) and more information gain/smaller values for both AIC and BIC [80] for the wished roles vs. the reversed ones. They acted as determinants (predictors).

The 6th selection phase focused on testing the existing collinearity between the remaining influences (those emerging after the 3rd phase) and the selected predictors (those resulting after the 4th). Ordinary least squares (OLS) regressions served, and the computed VIF (variance inflation factor) stood against (Equation (1)) the maximum accepted VIF threshold of the model [81,82]. In addition, the maximum absolute values from the matrices with correlation coefficients (maxAbsVPMCC) [83] corresponding to both influences and predictors were objects of evaluation [72,84].

$$\text{Model's maximum accepted VIF} = 1/(1 - \text{model's R-squared}) \tag{1}$$

Additionally, a prediction nomogram [85] resulted when using the *nomolog* command (after its previous installation using the following command: *net install st0391\_1, replace from (http://www.stata-journal.com/software/sj15-3)*, and considering the most stalwart remaining predictors).

Finally, each socio-demographic variable previously used for cross-validations served controlling purposes (new models). The latter meant adding them one by one on top of the existing most robust model. They included the most resilient predictors emerging after the previous selection round.

All data processing and tests took place on a Windows Server Datacenter virtual machine (Intel Xeon Gold 6240 CascadeLake CPU and ~32 Gigabytes of memory) in a private cloud. The reporting of the results mainly benefited from the *estout* prerequisite package (*ssc install estout, replace*) with support for both the *eststo* and *esttab* commands [86,87], allowing the direct generation of tables (in the console and as external files, respectively) with default performance metrics, as well as some additional ones [83] of well-known statistical models.

As the reviewers of this manuscript have suggested (and I thank them very much for this observation), there are significant differences between data mining and statistics. Among others, they concern the approaches and techniques used, the propositions and hypothesis statement (loosely vs. well-defined), and the considered type and volume of data (all available vs. sample; several million to a few billion data points vs. hundreds to thousands). In addition, there are also consistent differences between exploratory approaches and those specific to empirical science. This paper benefits from the advantages

of all these categories. The letter is coupled with those emerging when comparing the results obtained this way with the ones from the existing scientific theory.

#### **4. Results**

After performing the first selection step using adaptive boosting (in the Rattle library —https://rattle.togaware.com of R, accessed on 22 October 2022), a set of 38 variables resulted (Figure 1).

As seen in Figure 1, one way to look at the importance of the resulting variables is by considering their corresponding frequencies of use in the tree construction.

The next concern before going to the second selection step, dedicated to filters on absolute values of pairwise correlation coefficients, was to find and keep (consolidation) only the sources of the following variables:

	- A008 (Feeling of happiness).
	- A165 (Can most people be trusted?).
	- E018 (Future changes such as greater respect for authority).
	- E025 (Political action such as signing a petition).
	- F063 (How important is God in your life?).
	- F118 (Is homosexuality justifiable?).
	- F120 (Is abortion justifiable?).
	- G006 (How proud of nationality?).
	- Y002 (Post-materialist index 4-item).
	- Y003 (Autonomy index).

After this consolidation point, 51 unique variables resulted: A008 (Section 4 (d) above), A029, A034, and A042 (Section 4 (b) above), A124\_06 (Neighbors: Immigrants/foreign workers), A124\_07 (Neighbors: People who have AIDS), A124\_09 (Neighbors: Homosexuals), A165 (Section 4 (d) above), A191 (It is important to this person living in secure surroundings), C001\_01 (Section 4 (b) above), C004 (Jobs scarce: older people should be forced to retire) C009 (First choice, if looking for a job), C038 (People who don't work turn lazy), D054 (Section 4 (a) above), D059, and D060 (Section 4 (b) above), D063\_B (Job best way for women to be independent), D066\_B (Problem if women have more income than husband), E001, E002, E003, and E004 (Section 4 (b) above), E018 (Section 4 (a) and above), E025 (Section 4 (d) above), E143 (Immigrant policy), E226 (Democracy: People choose their leaders in free elections), E247 (Priority: Global poverty versus National problems), F063, F118, and F120 (Section 4 (d) above), F121 (Section 4 (b) above), G006 (Section 4 (d) above), G007\_36\_B (Trust: People of another nationality), G015 and G015B (citizenship), G016

(Language at home), G017 (birth country), G027A (Respondent immigrant), G059 (Effects of immigrants on the development of own country), G061 (Measures taken by the government when people from other countries are coming here to work), S003 (ISO 3166-1 numeric country code), S006 (Original respondent number), S007 (Unified respondent number), S010 (Total length of interview), S016 (Language in which interview was conducted), S018 (weight), S020 (Year of survey), S021 (Country-wave-study-set-year), X048ISO (Counties and Country Macroregions ISO 3166-2), Y002, and Y003 (Section 4 (d) above).

After performing the second phase meant for filters starting from pairwise correlation coefficients as absolute values (≥0.1), together with their significance (*p* < 0.001) and support (at least a third of the data or N > 142,150), 19 variables resulted as indicated in Table 1. The same results were more easily achieved using the PCDM command (Stata script at https://tinyurl.com/25pd6mx6, accessed on 30 January 2023) in Stata [73] and three parameters (minacc (0.1) minn (142,150) maxp (0.001)) corresponding to those three filters above.

**Table 1.** Tabular view of the results of the second selection round based on magnitude of correlation coefficients, support, and significance.


The next concern before going to the third selection step (dedicated to cross-validations on specified criteria) was to recode ("nt" call sign meaning null treatment) the remaining variables (all 19 in Table 1). In addition to these, the ones to be used as clustering criteria in cross-validations or for further controls were recorded as well. The main concern here was to remove missing and DK/NA answers and adapt the scales to the original meaning of the source questions (Listing A1 and Tables A1 and A2, Appendix A).

The results after the third selection phase relied on mixed-effects modeling. They consisted of discovering and emphasizing the resisting influences (ten from 19, Table A3) no matter the chosen clustering criteria from the set of socio-demographic variables (bottom of Listing A1, lines 49–70, Appendix A section), including the year of the survey (S020, which did not require processing). Just ten influences from the previous list of 19 proved to be robust in this third selection round (Table A3), namely: A124\_06nt, C001\_01nt, C009nt, C038nt, D054nt, D059nt, E143nt, F118nt, G007\_36\_Bnt, and Y002nt. The remaining eight influences failed at least in one scenario (A124\_07nt-models 6, 9, 11–22; A124\_09nt-models 6, 7, 10, 11, and 22; A165nt-model 11; D060nt-models 2–11, 21, and 22; E025nt-models 1–8, 10–19, 21, and 22; F063nt-models 9, and 20; F120nt-models 9, 20, and 22; F121nt–models 9, 11, 20, and 22; Y003nt-models 1–11, 12–15, and 17–22).

The fourth selection round (Stata script at https://tinyurl.com/4x3ez5y9, accessed on 30 January 2023) used CVLASSO and RLASSO and the remaining ten variables. It encountered no loss in selection.

The fifth selection round dedicated itself to reversing causality checks. In addition, it removed one influence from the remaining ten (ordered logit—Table A4) when focusing on the predictors/determinants (the sense of the influences was counted). It gave up A124\_06Cnt (Neighbors: Immigrants/foreign workers).

The sixth selection round, responsible for discovering evidence of collinearity (OLS max.Comput.VIF overpassing OLS max.Accept.VIF), further eliminated two variables (D059nt and F118nt—Table A5). Consequently, four matrices with correlation coefficients (only for the predictors in Models 1 and 2, 5 and 6, 9 and 10, and 15—Figure 2) additionally resulted. D054nt was temporarily removed (Models 9 and 10) because of being collinear with F118nt. The latter brought a higher accuracy and an R-squared value (Model 7 vs. Model 8 in Table A5). However, later, after removing F118nt (collinear with C001\_01nt, Models 11 and 12), D054nt was added back (Logit Model 15 had the highest accuracy— AUCROC = 0.7852) and generated no collinearity (Table A5—Model 16).

**Figure 2.** Assessing collinearity using consecutive matrices with correlation coefficients only for predictors (Stata script at https://tinyurl.com/ueefxfmd, accessed on 30 January 2023).

When cross-validating again (second stage: Stata script at https://tinyurl.com/mwb6 nher, accessed on 30 January 2023) starting from these seven remaining determinants and the same clustering criteria for cross-validations (including counties and country macroregions—X048WVSnt), no loss in selection occurred.

In terms of support (Stata script at https://tinyurl.com/f868yab4, accessed on 30 January 2023), more than 45,000 observations corresponding to a single wave served in most cases. Additionally, this is because all seven predictors and the response variable were considered simultaneously only in Wave 5 (2005–2009).

A prediction nomogram (Figure 3, *nomolog* command in Stata) starting from binary logistic regressions (Table A5—Model 15) served visual interpretations for all seven remaining determinants. This model, which has seven predictors, generated an R<sup>2</sup> of 0.1799 and a fair-to-good accuracy (AUCROC of 0.7852). The maximum theoretical probability for the most advantageous combination of variable values (Figure 3) is more than 0.99. The latter corresponds to a total score of 39.55 (second X-axis—bottom of Figure 3) as the top-down sum of 3.5, 6.75, 7.6, 4.6, 4.4, 2.7, and 10, values determined relatively easily after drawing perpendiculars to the first X-axis (Score). For other combinations of values (e.g., right edge of Figure 3), these seven predictors were identified as the most important ones; lower total scores emerged (e.g., 21.95). They indicated less critical cases and a lower corresponding probability (e.g., >0.8) of prioritizing the nation's people to the detriment of immigrants regarding access to jobs. This nomogram also suggests the magnitude of the marginal effects (visually as segments corresponding to the unit difference on any scale—Figure 3 and Model 1, Table A7, Appendix A) for those seven robust determinants. In addition, it serves to understand the cumulated effect size by considering the amplitude of any scale visible in this representation.

**Figure 3.** Risk prediction nomogram corresponding to the most resilient predictors (generated using the *nomolog* command in Stata).

Further controls (Table A6, Appendix A) are based on all seven most resilient predictors (Figure 3) and each of those eleven socio-demographic variables already used in crossvalidations. All confirmed the robustness of the already identified hepta-core base model (Figure 3 and Models 1 and 13, Table A6, Appendix A), but only seven of them (Models 2, 6–9, 11, 12, 14, 18–21, 23, and 24, Table A6, Appendix A) proved to be significant. The best models here are those additionally emphasizing the role of the settlement size (X049nt, Model 11, based on a logit regression, and Model 23, based on an ologit one, Table A6, Appendix A). They have the highest McFadden's pseudo R-squared (0.1937 for logit and 0.1108 for ologit), AUC-ROC (0.7946), and the lowest AIC (29162.5254 and 58024.8556) and BIC (29238.7119 and 58110.7761) if compared to the base ones (containing only those seven predictors—Models 1 and 13, Table A6, Appendix A).

Moreover, only for these seven additional confirmed influences were the corresponding models also reported using coefficients computed as average marginal effects (Table A7, Appendix A) and containing direct references to the hypothesis codes. The performance metrics (e.g., pseudo R-squared, AUC-ROC, AIC, and BIC) are the same as in the case of Models 1, 2, 6–9, 11, and 12, Table A6, Appendix A). The interpretation of the coefficients in Table A7 (Appendix A, immediately above the errors reported between round parentheses) follows a simple rule. Each such value indicates the effect of an increase (for positive coefficients)/decrease (for negative ones) by one unit in the value of the corresponding variable (for a given model) on the target variable. This effect translates into the probability of finding it acceptable for employers to prioritize their employees over immigrants increasing by the same value (as the one of the coefficient) but in percentage points.

#### **5. Discussion**

The most important of these seven predictors is magnitude (the descending order of scale amplitudes as a visual representation can be found in Figure 3), which corresponds to the attitude towards gender inequality in terms of jobs. It indicates that people agreeing that men should have more rights to a job than women. It is a fact that they are also more likely to accept the idea that employers should prioritize co-nationals than immigrants in case of job scarcity (positive influence or the maximum value of 2 on the right—Figure 3). The latter means that the attitude to the first type of inequality regarding access to jobs (the gender-related one) is the best predictor of the one towards the second type (the immigration-related one). This finding is in line with the already documented relations between gender and migration when it comes to various kinds of discrimination, as mentioned in the scientific literature [88–90].

The second most important determinant when considering the same magnitude criterion seems to correspond to the permissiveness level of the immigration policy. As expected, the latter shows that the ones manifesting a lower level of this type of permissiveness are also more likely (negative influence or the minimum value of 0 on the right—Figure 3) to accept the idea of prioritizing citizens over immigrants in the event of job shortages (validation of H1). Although this finding seems almost obvious, the relationship between migration policy and job discrimination is a complex and well-studied one [91–93].

The third most potent predictor found (Figure 3 and Model 15 in Table A5) is related to the level of trust in people of another nationality. It means that the people with a lower level for this type of trust are also more likely (negative influence or the minimum value of 0 on the right—Figure 3) to accept that employers should prioritize citizens over immigrants in case of lack of jobs. The latter is in line with the findings of other scholars [94–96] and contributes to the validation of H2.

The fourth mightiest determinant corresponds to extrinsic motivations (one of the principal life goals of the respondents is to make their parents proud, also known as devoutness and partially related with traditions due to the interpretation of familism as one of their foundations [97]). That has a positive influence on the response variable. Its maximum value of 3 on the right is observable in Figure 3. It means that people more motivated this way (or devoted to parents in these terms) are also more likely to prioritize their co-nationals in case of job shortages. This finding also stands when considering the existing scientific literature [98,99]. Additionally, it applies if starting from the connection of both items with the notion of power distance. More specifically, the question specifying whether agreeing with making one's own parents proud is assumed to extend to the family. Moreover, it captures the obedience and hierarchy in the family concepts. The one as to whether nationals are privileged over immigrants when jobs are scarce is directly related to the definition of power distance. The particular way the devoutness works contributes to validating H3.

The next most important predictor (fifth) relates to the acceptance level regarding the idea that people who do not work turn lazy (also with a positive influence—the maximum value of 4 on the right, as seen in Figure 3). The latter shows that people more inclined to accept this attitude towards work are also more protective of the nation's people's access to jobs. This finding complements other findings in the scientific literature, revealing the limitations of migrant working identity [53,100].

The sixth most potent determinant concerns the post-materialist index (the version with four items), which has a negative influence (the minimum of 1 on the right—Figure 3). The latter refers to people with a lower appetite for postmaterialist values or less need for independence and fulfillment of personal objectives in life [101]. They are also more likely to prioritize their co-nationals at the expense of immigrants as access to employment. This finding is in line with the ones of [102], through the concept of subjective well-being associated with endorsement of democracy, greater emphasis on postmaterialist values, and less intolerance (more tolerance) of immigrants and members of different racial and ethnic groups.

The specific way these two predictors function means a complete validation of H4.

The last most important predictor in terms of magnitude corresponds to the variable measuring the preference regarding a job with benefits for the community rather than individual ones (negative influence—the minimum value of 1 on the right—Figure 3). It indicates that people are less likely to prefer community-oriented jobs; on the contrary, they are more oriented towards individual benefits when it comes to a job or are simply more selfish [103]. They are more inclined to protect the nation's people in case of job shortages. The latter contributes to the full validation of H2.

Next, all seven resilient predictors previously found (Figure 3) stood as a strong base for further controls (Table A6, Appendix A). Those used all socio-demographic criteria involved in cross-validations. Only seven of those criteria indicated significance.

First, the gender influence (Models 2 and 14, Table A6, Appendix A) indicates that female respondents are more protective of citizens than immigrants regarding access to jobs. It means that women are more likely to consider it more justifiable for employers to prioritize the people of their nation than men. The latter is in line with some findings in the literature [104,105] and contradicts others [106].

An additional socio-demographic variable was found significant (income scale, Models 9 and 21, Table A6, Appendix A). By its sign (negative), the latter indicates that those who earn more are less inclined to consider it justifiable for employers to prioritize nationals than immigrants. This idea stands in the light of the findings of Chandler and Tsai (2001) [107], Tucci (2005) [108], Tavakoli and Chatterjee (2021) [109], and Ruhs (2018) [110]. For the last author, this is especially true for high-skilled migrants. The same applies to those with a higher education level (Models 7 and 19, Table A6, Appendix A). Additionally, this is also in line with the findings of Tavakoli and Chatterjee (2021) [109]. They concluded that an additional level of education on the earnings of an individual and his family income will bring better financial welfare and security. In turn, the latter will reduce the perception of the economic threat of immigrants. The same is true for those with an employment status more near a full-time job (Models 8 and 20, Table A6, Appendix A) and the opposite (positive coefficient sign) for the ones having more children (Models 6 and 18, Table A6, Appendix A). These last two findings are consistent with those on the income dependence of the response variable. The latter state that people in higher-income groups are more tolerant towards immigrants [111], more positive in their attitudes to them [112], and show significantly lower levels of welfare chauvinism [113].

Another significant control variable corresponds to the settlement size (Models 11 and 23, Table A6, Appendix A). The latter contributes to the best models (largest McFadden pseudo R-squared, AUC-ROC, and lowest AIC and BIC) with eight predictors (hepta-core plus each additional control), as already emphasized at the end of the Results section above. Due to its sign (negative), it shows that people from larger communities (bigger cities) are also less inclined to consider it acceptable for employers to prioritize nationals to the detriment of immigrants. In the case of Europe, this finding stands, and such respondents are more likely to have more tolerant attitudes towards immigrants [111]. Similarly, with direct reference to the case of Canada, other scholars [114] highlighted a particularity of large urban areas when compared with the small ones, namely, the existence of immigrant service providers and

language-training venues. By contrast, in Russia, for example, people living in the countryside are the least xenophobic, while the population of big cities is the most xenophobic [115]. All these mean the partial validation of H5, when considering that some socio-demographic variables were not found to be significant (e.g., age, marital status).

Due to its positive coefficient sign, the last significant control variable (the survey year, Models 12 and 24, Table A6, Appendix A) indicates a relevant finding. Despite the undeniable globalization and the rise of multiculturalism, over time, people have increasingly come to believe that it is more acceptable for employers to prioritize citizens over immigrants. The latter contradicts studies focused on general attitudes towards immigration [116] or integration of immigrants [117] based on considering specific regions and expanding for a shorter time.

As expected, due to its nature (nominal numerical codes originally unrelated to a specific intensity scale), the variable corresponding to the counties and country macroregions (X048WVS—in the given form) in which the interview took place did not prove to be statistically significant as a control variable. Still, it has proven to be extremely important [118,119] for cross-validations. The same argument (numerical codes originally unrelated to an intensity scale but useful for cross-validation) applies to the values of the variable corresponding to the country code (S003—ISO 3166-1 numeric country code). Still, the latter was identified in the first selection round (adaptive boosting—right side of Figure 1). Therefore, differences among countries are expected beyond these seven common predictors, referred to as a hepta-core model. However, the specific features of countries and particular regions (e.g., a dummy variable referring to whether a country is ex-communist or not [120], some country-dependent measures of economic activity such as GDP or the ratio between stock market capitalization and GDP defined in The World Bank Data Catalog or even the Worldwide Governance Indicators defined by Kaufmann et al. in 2010 [121] and used in many other studies, including recent ones [122,123]) will be the object of future research on the same topic but with more focus on certain local peculiarities.

#### **6. Conclusions**

An accurate model with seven strong influences emerged in this paper. These act more as determinants because of passing reverse causality checks. They indicate a specific type of world values survey respondents. It is about the ones less likely to consider it acceptable for employers to prioritize their people over immigrants. These are as follows: those who believe in emancipative values, namely, the ones of gender equality for jobs, those choosing a profession more relevant for the community than for themselves, those disagreeing that people who do not work will turn lazy, the ones with higher values if inverse devoutness (less inclined to make their own parents proud), the ones agreeing with a less prohibitive immigrant policy, those who trust more in people of another nationality, and the ones with a profile corresponding to a higher value for the post-materialist index. In addition, some controls generally emphasized the positive roles of three socio-demographic variables. There are the female gender, the number of children, and the survey year. It is also worth mentioning the negative ones, which are education level, employment status in terms of involvement in a full-time job, income scale, and settlement size (the most important control variable in terms of performance added to the basic hepta-core model), when considering whether it is justifiable for employers to prioritize the people of their nation rather than immigrants. By allowing visual interpretations corresponding to the seven most resilient determinants, the prediction nomogram presented in this paper serves both as a powerful probability identification instrument and as a decision support tool that serves management systems under conditions of uncertainty and risk. All conclusions related to the identified determinants stand on models with fair-to-good classification accuracy. They resulted after performing many selection rounds and robustness checks.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The data used in this study belongs to the World Values Survey, which conducted surveys following the Declaration of Helsinki.

**Informed Consent Statement:** The World Values Survey obtained informed consent from all subjects involved in the study.

**Data Availability Statement:** The dataset used in this study belongs to the World Values Survey is the .dta file inside the "WVS TimeSeries 1981 2020 Stata v1 6.zip" archive (https://www. worldvaluessurvey.org/WVSDocumentationWVL.jsp, accessed on 22 October 2022, the "Data and Documentation" menu, the "Data Download" option, the "Timeseries (1981–2022)" entry).

**Acknowledgments:** For allowing the exploration of the dataset and the agreement to publish the research results, the author would like to thank the World Values Survey and supporting projects. In terms of technical assistance (https://cloud.raas.uaic.ro, (accessed on 22 October 2022), as a private cloud of the Alexandru Ioan Cuza University of Ias,i, Romania), this paper benefited from the support of the Competitiveness Operational Programme Romania. More precisely, project number SMIS 124759—RaaS-IS (Research as a Service Iasi) id POC/398/1/124759.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A**

**Listing A1.** Recoding the remaining variables using a Stata script with numbered lines—numbers displayed separately, as when opened with the Stata editor (Stata script at: https://tinyurl.com/ 5n6bdfss, accessed on 30 January 2023).


```
21. generate C009nt=.
2 2 . r e pl a c e C009nt=C009 i f C009 ! = . & C009>0 //or First choice ,
   i f looking fo r a job : 1 . good income , 2 . s a f e job , 3 . wrk &people
   u li k e , 4 . Do an import . job , 5 . Do someth . f o r community
23. generate C038nt = .
2 4 . r e pl a c e C038nt=5−C038 i f C038 ! = . & C038>0 //or People who
   don ' t work turn lazy
25. generate D054nt = .
2 6. repl ace D054nt=4−D054 i f D054 ! = . & D054>0 //or One o f main
   g o al s in l i f e has been to make my p a ren t s proud ( sou rce f o r
   Y011C=DEVOUT− Welzel defiance −3: Inverse devoutness )
27. generate D059nt=.
2 8. repl ace D059nt=4−D059 i f D059 ! = . & D059>0 //or Men make
   better political leaders than women do ( sou rce f o r Y022B=
   WOMPOL− Welzel equality −2: Gender equali ty : p o litics )
29. generate D060nt=.
3 0. repl ace D060nt=4−D060 i f D060 ! = . & D060>0 //or U ni v e r si t y i s
   more important f o r a boy than for a girl ( source f o r Y022C=
  WOMEDU− Welzel equality −3: Gender equality : education )
31. generate E025nt=.
3 2 . r e pl a c e E025n t=3−E025 i f E025 ! = . & E025 >0 //or P o l i t i c a l
   action : Signing a petition
33. generate E143nt = .
3 4 . r e pl a c e E143n t=4−E143 i f E143 ! = . & E143 >0 //or Immigrant
   p oli c y : 1 Le t anyone come . 4 Prohibit people from coming
35. generate F063nt=.
36. replace F063nt=F063 i f F063 ! = . & F063 >0 //or How important is
   God in your l i f e
37. generate F118nt=.
38. replace F118nt=F118 i f F118 ! = . & F118 >0 //or J u s t i f i a b l e :
   Homosexuality
39. generate F120nt=.
40. replace F120nt=F120 i f F120 ! = . & F120 >0 //or J u s t i f i a b l e :
   Abortion
41. generate F121nt=.
42. replace F121nt=F121 i f F121 ! = . & F121 >0 //or J u s t i f i a b l e :
   Divorce
43. generate G007_36_Bnt=.
4 4. repl ace G007_36_Bnt=4−G007_36_B i f G007_36_B ! = . & G007_36_B>0
   //T ru s t : People o f another na tionality (B)
45. generate Y002nt=.
4 6 . r e pl a c e Y002nt=Y002 i f Y002 ! = . & Y002>0 //or Pos t −M a t e r i a l i s t
   index 4−item : 1 M a te ri ali s t , 2 Mixed , 3 P o s tm a t e ri ali s t
47. generate Y003nt=.
4 8 . r e pl a c e Y003nt=2+Y003 i f Y003 ! = . & Y003>−5 //or Autonomy
   Index : −2 Obedience/Religious F ai th . . 2 Determination ,
   perseverance/Independence
49.*FOR BUILDING CLUSTERS WHEN PERFORMING CROSS−VALIDATIONS :
50. generate X001nt = .
5 1 . r e pl a c e X001nt=X001 i f X001 ! = . & X001>0 //Gender
52. generate X003nt = .
5 3 . r e pl a c e X003nt=X003 i f X003 ! = . & X003>0 //Age
54. generate X007nt = .
```
5 5 . r e pl a c e X007nt=8−X007 i f X007 ! = . & X007>0 //M a rital status 56. generate X007bin=. 5 7. repl ace X007bin=1 i f X007==1 | X007==2 5 8 . r e pl a c e X007bin=0 i f X007 ! = . & X007>2 //M a rital status as with someone or not 59. generate X011nt=. 6 0 . r e pl a c e X011nt=X011 i f X011 ! = . & X011>=0 //How many children do you have 61. generate X025nt=. 6 2 . r e pl a c e X025nt=X025 i f X025 ! = . & X025>0 //Highest educational level attained 63. generate X028nt=. 6 4 . r e pl a c e X028nt=8−X028 i f X028 ! = . & X028>0 & X028<9 // Employment s t a tu s 65. generate X047nt=. 66. replace X047nt=X047 i f X047 ! = . & X047>0 // S c al e o f incomes 6 7 . gene r a te X048WVSnt=. 6 8 . r e pl a c e X048WVSnt=X048WVS i f X048WVS ! = . & X048WVS>0 //Regions 69. generate X049nt=. 7 0 . r e pl a c e X049nt=X049 i f X049 ! = . & X049>0 // Se t tleme n t si z e

**Table A1.** The most relevant items of this study.



#### **Table A1.** *Cont.*


**Table A1.** *Cont.*


#### **Table A1.** *Cont.*

Source: WVS data.

**Table A2.** Descriptive statistics for the most relevant WVS items used in this study.


Source: own calculation in Stata (Stata script at https://tinyurl.com/yt872hcs, accessed on 31 January 2023).


**Table A3.** The results of

cross-validations

 on some

socio-demographic

 variables using

mixed-effects

 binary (first 11 models) and ordered logit


 own calculation in Stata (Stata script at https://tinyurl.com/susvkppj, accessed on 30 January 2023). Notes: var (\_cons []) relates to the criterion. Robust standard errors are between round parentheses. The raw coefficients emphasized using \*, \*\*, and \*\*\* are significant at 5%, 1%, and 1‰. Red vs.green indicates a loss of significance (not selected variables) vs. the opposite (the selected ones).

**Table A3.** *Cont.*


**Table A4.** The results of the first stage of reverse causality checks using ordered logit.

parentheses.

and lower model scores and variables not selected (red).

 The raw coefficients emphasized using \*\*\* are significant at 1‰. Colors are applied to emphasize better model scores and selected variables (green)


**Table A5.**

Identified collinearity

 issues.


**Table A6.**

Controlling

 using the most relevant seven remaining predictors (hepta-core)

 and most of the

socio-demographic

 variables in logit


**Table A6.** *Cont.*


**Table A7.** The average marginal effects identified after controlling using the most relevant seven predictors (hepta-core) and each of the other seven most significant socio-demographic control variables in logit models.

Source: own calculation in Stata (Stata script at https://tinyurl.com/yvc3py3u, accessed on 30 January 2023) Notes: robust standard errors are between round parentheses. Coefficients computed as average marginal effects and emphasized using \*\*\* are significant at 1‰. The H codes on the left indicate the hypotheses to which the variables next to them belong.

#### **References**


67. Gorodzeisky, A.; Semyonov, M. Terms of exclusion: Public views towards admission and allocation of rights to immigrants in European countries. *Ethn. Racial Stud.* **2009**, *32*, 401–423. [CrossRef]


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Algorithmic Strategies for Precious Metals Price Forecasting**

**Gil Cohen**

Department of Management, Western Galilee Academic College, P.O. Box 2125, Acre 2412101, Israel; gilc@wgalil.ac.il

**Abstract:** This research is the first attempt to create machine learning (ML) algorithmic systems that would be able to automatically trade precious metals. The algorithm uses three forecast methodologies: linear regression (LR), Darvas boxes (DB), and Bollinger bands (BB). Our data consists of 20 years of daily price data concerning five precious metals futures: gold, silver, copper, platinum, and palladium. We found that all of the examined precious metals' current daily returns are negatively autocorrelated to their former day's returns and identified lagged interdependencies among the examined metals. Silver futures prices were found to be best forecasted by our systems, and platinum the worst. Moreover, our system better forecasts price-up trends than downtrends for all examined techniques and commodities. Linear regression was found to be the best technique to forecast silver and gold prices trends, while the Bollinger band technique best fits palladium forecasting.

**Keywords:** precious metals; gold; silver; algorithmic trading; futures

**MSC:** 37M22

#### **1. Introduction**

The use of artificial intelligence (AI) in financial assets price forecasting and trading has become more and more frequent as the amount and speed of the flow of new financial data increased dramatically. Algorithms are used to analyze simultaneous multi-sourced data. Those systems are developed by market experts and are usually applied to stocks and currencies markets. The following research develops and tests such an AI system and applies it to the precious metals' futures market. Precious metals have always been perceived by investors as a hedging tool against inflation (see, for example, [1]) or stock market crashes. In the following research, we designed, optimized, and tested three algorithmic trading systems suitable for precious metal futures trading. Our long period of time data enables us to test the performance of our system over changing economic conditions. The technical analysis approach used here, commonly used by practitioners to trade stocks and foreign exchanges, relies on historical data for the sake of forecasting future prices. We used the particle swarm optimization (PSO) algorithm as our primary optimization tool because of its ability to handle multi-objective optimization simultaneously.

Many researchers have tried to prove the ability of such algorithmic trading systems to achieve abnormal returns for stocks, currencies, and indices. However, many researchers focus on stocks and foreign exchange and partly neglected commodity futures and especially precious metal futures. The following research aims to fill that gap with an insight into three algorithmic trading strategies that were programmed in accordance with the uniqueness of the precious metal financial markets. We use 20 years of daily futures data corresponding to five major precious metals, including gold, silver, copper, platinum, and palladium, to test three algorithmic trading strategies: linear regression (LR), Darvas boxes (DB), and Bollinger bands (BB). We followed [2], that concluded that LR and DB could help traders predict Bitcoin short-term price trends. Our 20 years of data were split into 10 years of training and optimization and 10 years of testing the trading results. We found that it is possible to forecast short-term price trends of precious metals. Silver futures prices were

**Citation:** Cohen, G. Algorithmic Strategies for Precious Metals Price Forecasting. *Mathematics* **2022**, *10*, 1134. https://doi.org/10.3390/ math10071134

Academic Editors: Alexandru Agapie, Denis Enachescu, Vlad Stefan Barbu and Bogdan Iftimie

Received: 24 February 2022 Accepted: 30 March 2022 Published: 1 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

found to be best forecasted by our systems, and platinum was the worst. Our system better forecasts price-up trends than downtrends for all examined techniques and commodities. Linear regression was found to be the best technique to forecast silver and gold prices, while the Bollinger band technique best fits palladium forecasting.

#### **2. Literature Review**

Our system is based on pattern recognition which is a developing AI field that helps us to understand different chaotic phenomena. Ref. [3] argued that the applicability of Bayesian methods was greatly enhanced through the development of a range of approximate inference algorithms such as variational Bayes and expectation propagation. An important foundation for learning input–output mapping from a set of examples was presented by [4]. They developed a theoretical framework for the approximation method based on regularization networks that are closely related to pattern recognition. Their methodologies included task-dependent clustering and dimensionality reduction. Other researchers provided an understanding of the mathematical concepts behind forecasting methods that are based on probabilistic derivations. Ref. [5] provided a joint introduction to Gaussian processes (GP) and relevance vector machines (RVM-developed by [6]). They found that RVMs allow the choice of more general basis functions, whereas the behavior of predictive variance is generally counterintuitive. Ref. [7] examined the GP and RVM models and concluded that probabilistic models could produce predictive distributions instead of point predictions.

Most researchers that tried to explain precious metals prices have done so by linking the stock market to the precious metal market. Ref. [8] explained that precious metal futures have higher returns when investor sentiment is pessimistic rather than optimistic. Ref. [9] argued that the price of precious metals and their volatility are driven by shocks originating in the economic uncertainty and risk appetite of investors that prevail in the equity market. Other researchers focused on the interrelations between the prices of the leading precious metals. Ref. [10] showed that precious metals were strongly correlated with each other in the last decade. Ref. [11] documented that weekly changes in traders' positions have a destabilizing impact on subsequent conditional volatility in gold, silver, and palladium futures markets.

Other researchers linked precious metals prices to each other and other commodities. Ref. [12] examined spillover effects among six commodity futures markets and found that both gold and silver are information transmitters to other commodity futures markets. Ref. [13] have examined the impact of oil price changes on precious metals prices. They identified the safe-haven nature of precious metals against an oil price drop.

Past researchers also attempted to construct AI systems to predict precious metals prices. Ref. [14] proposed a model that combines the adaptive neuro-fuzzy inference system and genetic algorithm. Ref. [15] discovered hidden patterns governing systems' evolution. Unlike these attempts to predict precious metals prices, we designed algorithmic trading systems and tested their ability to predict precious metals prices.

#### **3. Data and Methodologies**

Our data consists of 20 years of daily data of open–closed, high–low prices of five precious metals futures. We used a lagged multi-dimension stepwise regression model to examine lagged correlations between the daily return of the examined precious metals, including autocorrelations, as described in Equation (1).

$$(G, S, C, P, Pa)\_i = \beta\_1 G\_{i=-1...-3} + \beta\_2 S\_{i=-1...-3} + \beta\_3 C\_{i=-1...-3} + \beta\_4 P\_{i=-1...-3} + \beta\_5 P a\_{i=-1...-3} \tag{1}$$

where: (*G*, *S*, *C*, *P*, *Pa*)*<sup>i</sup>* = the daily return of gold, silver copper, platinum, and palladium, (*G*, *<sup>S</sup>*, *<sup>C</sup>*, *<sup>P</sup>*, *Pa*)*i*=−1...−<sup>3</sup> is 1 ... 3 days ago daily returns of gold, silver, copper, platinum, and palladium.

The results of this model enabled us to better understand short term autocorrelations of returns and lagged dependencies between the precious metals price movements and helped us design our trading systems.

#### *3.1. Algorithmic Trading System*

We designed our algorithmic trading system to report the actual trading results: net profit (NP), percent of profitable trades of all trades (PP), and the profit factor (PF). NP is the dollar value of the total net profit generated by the trading system, PP is the percentage number of winning trades out of the entire set of trades generated by the system, and PF is defined as gross profits divided by gross losses. We programmed three algorithmic systems based on three sophisticated trading technical tools and altered their configuration until we achieved maximum profitability in terms of NP and PF. The designed systems are based on three methodologies: linear regression, Darvas boxes, and Bollinger bands which are well-known technical formations that are commonly used to analyze investment opportunities for stock and currencies traders. We then optimized NP and PF by altering the setups behind our systems and splitting the system's performance into long and short positions.

The complexity of our systems requires multi-objective optimization formulas. We selected particle swarm optimization (PSO), developed by Kennedy and Eberhart ([16,17]) as our primary optimization method. This methodology enabled us to train the system in the initial period and test it in the latter period. The 20 years of our examined period were split into two separate periods, 10 years of training and optimizing and 10 years of testing and reporting results. We started the process with a random trading setup that included the trading time frames and the various tools ingredients. Next, for each setup, we evaluated the desired fitness of the trading results to our predefined goals: Maximum NP, PF, and PP. We then compared each result to its former maximum and set a new maximum if needed. The process is described in Equation (2).

$$\text{V (1) i + 1,d} = \text{Vid} + \text{C1Rand} \times \text{Pid} - \text{Xid} + \text{C2R and Pgd} - \text{Xid} \tag{2}$$

$$\text{X (2) i + 1,d} = \text{Xid} + \text{Vid} \tag{3}$$

where Vid = the value of each setup, Rand = random number, Pid = the setups initial identification, and Pgd = the setups' maximum identification.

Last, we looped the process using Equation (3) until the highest multiple objectives were achieved.

#### *3.2. Linear Regression Strategy*

Figure 1 demonstrate how we used the linear regressions technique for algorithmic trading platforms.

**Figure 1.** Linear regression algorithmic trading strategy. Notes: Every candlestick in Figure 1 represent the high/low open/close of the commodity futures' daily prices. The middle line in Figure 1 represent the linear regression line, while the other two lines represent one standard deviation from it.

A linear regression strategy demands the length of time for the line formation and the span from that line that determines the entry and exit from the trading positions. The regression line in Figure 1, for example, is based on 50 trading days when one standard deviation from that line determines the entry and exit points to the trading position. We started our PSO procedure with a random variable for both the daily time length and for the span that determined the actual entry and exits of trades. The system altered those variables in order to maximize our trading targets.

#### *3.3. Darvas Boxes Strategy*

Figure 2 show an example of an automated trading platform using Darvas boxes.

**Figure 2.** Darvas boxes algorithmic trading strategy. Notes: Every candlestick in Figure 2 represent the high/low open/close Bitcoin daily prices. A green daily candle means that the close price is above the opening price and a red candle means that the close price is lower than the opening price. The green and red lines indicate the upper and lower boundaries of Darvas's boxes.

Figure 2 show how Darvas boxes are designed and how they generate a long and short signal. This algorithmic trading system assumes that the trader is always exposed to price shifts between long and short positions. Darvas boxes use the notation that deviation from overtime horizontal support and resistance lines can be used to construct a winning trading strategy. The idea is that the asset's price should move within a specific box formation when no external news is provided and break formation when important news concerning the commodity is introduced to the financial markets. Boxes can be formed using any predetermined time frame according to the financial asset's volatility. A high volatility financial asset demands a shorter time frame for box formation than a low volatility asset. The PSO process starts with a random number of days to construct the boxes and alter them to achieve better trading performances. Once the size and shape of the boxes are formed in the training period, it is used for the tested period for which performances are remeasured.

#### *3.4. Bollinger Bands Strategy*

Bollinger bands (BB) (developed by John Bollinger) use two standard deviations away from a simple moving average. The trading strategy demonstrated in Figure 3 uses 14 days for the moving average calculation with the original two standard deviations. When the price of the commodity crosses the lower band, the system opens, a buy long order is placed, and when it crosses the upper band, a sell short order is generated.

**Figure 3.** Bollinger Bands algorithmic trading strategy. Notes: A green daily candle means that the close price is above the opening price and a red candle means that the close price is lower than the opening price.The middle brown line is a simple moving average and the blue lines are the upper and lower boundaries of the BB.

The PSO procedures start with random setups for both the moving average and the standard deviations and optimize both particles of our trading system.

The three methodologies that were tested in this research are based on the pattern recognition of price movements of the precious metals. The LR tries to adjust a linear model (horizontal or diagonal) to the data and determine price direction through a deviation from that linear formation. The DB methodology works on a shorter-term formation of boxes that represent the horizontal support and resistance lines. A deviation from that formation can be used to identify price trends shifts and support trading decision making. The concept that lies behind the BB structure does not demand the identification of a predetermined formation but rather determines a zone in which the financial assets are expected to move within a specific time frame. A break-out of the price from the expected zone can indicate irregularities of movements and can be used to make profits.

#### **4. Results**

We start the results section by presenting 10 years of (until the end of April 2021) monthly and daily correlations matrix between the returns of the examined precious metals.

From Table 1, we learn that all examine precious metal monthly returns are positively correlated. However, on a daily level, the correlations between the precious metals prices do not have the same sign. While gold and silver and copper and silver are negatively correlated, platinum and palladium and silver and platinum are positively correlated. We now apply to the daily data our designated multi-dimension regression model (Equation (1)), and report the results for the standard stepwise regression model is presented in Table 2. This model enables us to better understand the one to three day lag dependencies of each metal to its previous price changes and to the other precious metals.

Table 2 show an interesting phenomenon, all precious metals' current daily returns are negatively autocorrelated to their former days' returns: gold and silver to their former three consecutive days returns, platinum to its two consecutive days returns, and copper and palladium to their single former day returns. In terms of interdependencies, Table 2 exhibit that gold current daily returns are negatively affected by silver's former days' returns. However, silver's current daily returns are positively correlated to gold's returns two and three days ago. Platinum's current daily returns were found to be positively affected by gold, silver, and palladium's past returns. Palladium's current daily return was found to be positively correlated to yesterday's returns of silver and platinum and two

days ago of gold's returns. The observations described above about the precious metals' daily autocorrelations helped us better understand the fluency of daily prices to construct our trading strategy. All the designed trading systems are based on daily trading data. However, because of the different nature of these strategies, the number of days used for each of them which is determined solely by the optimization process, is different. For example, the linear regression system needs more days than the other methodologies to construct its formations; therefore, the algorithm needs a higher number of days to analyze the price trends and produce profitable trading signals than the systems that are based on Darvas boxes and Bollinger bands which are more dynamic in nature and demand fewer days to achieve their best performances.


**Table 1.** Correlations matrix of monthly and daily returns.

**Table 2.** Results of the regression model.


Notes: (*G*, *<sup>S</sup>*, *<sup>C</sup>*, *<sup>P</sup>*, *Pa*)*<sup>i</sup>* <sup>=</sup> daily returns of gold, silver, copper, platinum, and palladium, (*G*, *<sup>S</sup>*, *<sup>C</sup>*, *<sup>P</sup>*, *Pa*)*i*=−1...−<sup>3</sup> is 1 ... 3 days ago daily returns of gold, silver, copper, platinum, and palladium. \* = significant at 95% confidence level. *R*<sup>2</sup> = the proportion of the variation in the dependent variable that is predictable from the independent variable(s). F = Statistic test results that measure the fitness of the model to the data. T stat = the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error.

#### *4.1. Linear Regression Trading Strategy*

The linear regression strategy requires determining the number of days on which the linear regression line is formed. We start with a random number of days for each metal and optimize the trading results through our PSO system. The best trading results are summarized in Table 3 and Figure 4.


**Table 3.** Linear regression strategy trading results.

Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor, Days= The number of days on which the linear regression is constructed. \*\* = The highest NP.

**Figure 4.** Net profits trading results of linear regression strategy.

Table 3 and Figure 4 demonstrate that the linear regression methodology best fits to the trade of silver, palladium, and gold and fits less to the trade of copper and platinum. The best setup for gold and silver trading systems is 38 days, for which the system generated USD 177,198 and USD 561,425 NP, respectively. For palladium, the best setup is 20 days achieving an NP of USD 235,950 with a PF of 1.30. In Table 4, we split our trades into long and short trades to examine whether a difference in profitability will occur.


**Table 4.** Linear regression trading results of long/short strategies.

Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor. The results for gold and silver are calculated according to their optimum setups of 38 days, copper 37 days, platinum 35 days, and palladium 20 days.

Table 4 indicate that the linear regression technique fits both long and short trades. However, it is a better strategy for long trades than for short trades for all the examined commodities. The difference in long and short trades is significant for all metals in terms of NP and PF. Silver, again, leads the other metals in both long and short trades, resulting in a PF of 1.8 for long trades and 1.55 for short trades.

#### *4.2. Darvas Box Strategy*

Darvas box strategy requires determining the number of days on which the system will build the boxes formations and deliver buy or sell signals. Again, we start with a random number of days and let our PSO system optimize our goal functions. The best trading results are summarized in Table 5 and Figure 5.

**Figure 5.** Net profits trading results of Darvas boxes strategy.

The trading results according to the Darvas boxes methodology described in Table 5 and Figure 5 show that this methodology, like the linear regression technique, best forecasts silver price trends than copper and gold, and it is less effective in forecasting future prices of platinum and palladium. Our system generated an NP of USD 319,200 for silver, with a PF of 1.55, using a 7-day setup. This setup was found to be useful also for gold and copper trading. Table 6 divide all the trades into long and short trades using the optimized setups for each metal.


**Table 5.** Darvas boxes strategy trading results.

Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor, Days = The number of days on which the Darvas box is constructed. \*\* = The highest NP.

**Table 6.** Darvas boxes trading results of long/short strategies.


Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor. The results for gold, silver, and copper are calculated according to their optimum setups of 7 days, platinum 9 days, and palladium 10 days.

The table shows that for all five precious metals, the system again performed better for long trades than for short trades. Moreover, short trades have produced losses for gold, platinum, and palladium. The only precious metals for which the Darvas boxes technique fits both long and short trade are silver and copper. These results indicate that the system

based on the Darvas boxes methodology can better predict positive future price trends than negative trends.

#### *4.3. Bollinger Band Strategy*

Table 7 summarize the results of the examined metals prices using the Bollinger band (BB) technique. This methodology calculates a moving average of a predetermined number of the trading day and contrasts the upper and lower bands using two standard deviations from that moving average. Using our PSO system, we optimized the trading results for each commodity in terms of NP, PP, and PF. The results are presented in Table 7 and Figure 6.



Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor, Days = The number of days on which the Bollinger band is constructed. \*\* = The highest.

**Figure 6.** Net profits trading results of Bollinger bands strategy.

Table 7 and Figure 6 indicate that BB best forecasts silver and palladium futures prices, and it is less effective for copper and platinum. Seven days was found to be the best setup for silver and palladium, while 13 days best fit the gold price forecast. It is worth noting that silver and palladium prices are more volatile than the other metals, as was demonstrated in Table 1, resulting in relatively fewer preferred days setups for the BB methodology. The BB technique provided better percent of profitable (PP) results for all metals than the linear regression or the Darvas boxes techniques making it the lowest risk algorithmic trading system. Table 8 split the trades for long and short trades.


**Table 8.** Bollinger bands trading results of long/short strategies.

Notes: NP = Net profit, PP = Percent of profitable trades of all trades, PF = Profit factor. The results for gold, silver, and copper are calculated according to their optimum setups of 7 days, platinum 9 days, and palladium 10 days.

Table 8 indicate that, again, the BB methodology also fits long than short trades better. This technique fails to predict the negative price trends of gold.

#### **5. Summary and Implications**

In this research, we examined the short-term behavior of five major precious metals and tried to determine whether prices can be predicted and traded accordingly to algorithmic trading systems. By using a multidimensional regression model, we found that all precious metals' current daily returns are negatively autocorrelated to their former days' returns. Gold and silver are negatively correlated to the former three consecutive days' returns, platinum to two former days returns, and copper and palladium to a single former days' returns. The model also identified lagged interdependencies among the examined metals. These findings helped us to better understand the daily price fluctuation of each metal and to improve the trading systems. The trading systems used three forecasts' methodologies: linear regression (LR), Darvas boxes (DB), and Bollinger bands (BB). Our data consisted of 20 years of daily price data concerning five precious metals futures: gold, silver, copper, platinum, and palladium. During that long time, the precious metals experienced high and low price volatility under different economic conditions. We used PSO as our primary optimization tool because of the complexity of our target function. For that optimization process, we split our data into two equal time periods, 10 years of training and optimization of our system and 10 years of testing and reporting results.

We found that it is possible to forecast the short-term price trends of all the examined precious metals. Moreover, we documented that our system better forecasts price-up trends than downtrends for all examined techniques and commodities. Our systems best predict silver future prices and forecasts platinum prices the worst. Linear regression was found to be the best forecasting technique for silver and gold price trends, while the Bollinger band technique best fits palladium. This research has proven that precious metals prices can be predicted using an algorithmic trading system and, therefore, can be used by researchers, traders, and hedgers.

**Funding:** This research was funded by Western Galilee Academic College.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9437-8