Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression

Lian, Lian; Tian, Wen; Xu, Hongfeng; Zheng, Menglan

doi:10.3390/su10072275

Open AccessArticle

Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression

by

Lian Lian

^*

,

Wen Tian

,

Hongfeng Xu

and

Menglan Zheng

School of Transportation and Logistics, Dalian University of Technology, Dalian 116024, China

^*

Author to whom correspondence should be addressed.

Sustainability 2018, 10(7), 2275; https://doi.org/10.3390/su10072275

Submission received: 13 May 2018 / Revised: 25 June 2018 / Accepted: 26 June 2018 / Published: 2 July 2018

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

Numerous functions, especially the Gompertz function, have been predetermined to analyze the growth in vehicle ownership. This study utilizes the data-driven symbolic regression to automatically find a generalized function, named as new equation by symbolic regression (NE-SR), for passenger car ownership in six representative countries including Japan, England, USA, Finland, Poland and Australia. Then the new proposed function is applied for forecasting the passenger car ownership in China up to the year 2060. The experimental results indicate that the NE-SR, as an extension of the Gompertz function, fits better than the classical Gompertz function for car ownership growth. In NE-SR function, three scenarios can be realized by the variation of parameter signs, which are represented by the patterns of Japan, USA and Australia, respectively. The predicted results based on the NE-SR also show that the Chinese car ownership still has a potential to increase after 2060 in the pattern of Japan and Australia, but grows until around 2057 in the pattern of USA. The results can be used to further predict the energy demand and carbon emissions of passenger cars, which can provide a basis for the policymaker to propose transportation and environmental strategies.

Keywords:

vehicle ownership; Gompertz function; per capita GDP; symbolic regression

1. Introduction

The worldwide increase in urban mobility since the 1960s has directly resulted in increasing motor vehicles, especially in many low-income populous countries, such as China and India [1]. The tremendous growth of vehicle ownership has caused a series of problems, e.g., the increase of oil consumption, air pollution emissions, severe traffic congestion and the lack of parking space, etc. Moreover, car ownership is an important variable in car travel behavior research [2]. Therefore, it is important for academic researchers, environmentalists and policymakers to accurately forecast the development trend of vehicle ownership.

Vehicle ownership modeling has been widely researched. The models developed during 1995–2002 were reviewed and classified into nine categories in [3], which can be further divided into aggregate and disaggregate models according to data type. In the aggregate models, the ownership level of various vehicles, e.g., cars [4] and hybrid electric vehicles [5], can be analyzed on the basis of product life cycle and diffusion model which contains several sigmoid-shaped functions, e.g., the logistic, the Richards and Gompertz function [6,7]. Furthermore, the Gompertz function has been found to best fit the historical vehicle ownership data among these three functions [8], and a variety of researches studied the environment and transportation policy by assuming the growth of vehicle ownership as Gompertz function. For example, the future vehicle energy demand and greenhouse gas (GHG) were estimated depending on Gompertz function [9,10]. The effects of the two license quota policies on car ownership levels were compared and the delays in the process of personal motorization in Shanghai and Beijing were examined in [11]. However, is there any other function fitting better than Gompertz function to describe the relationship between economic factor and car ownership? This is an interesting problem worth in-depth investigation. In fact, several improved Gompertz functions have already been proposed for better forecasting vehicle ownership growth [12,13,14], where the corresponding parameters were estimated by statistics-based regression methods.

The traditional statistics-based regression methods need to assume a predetermined form of function according to the experience and knowledge. Then, the parameters of the proposed function are estimated by non-linear least squares method, maximum likelihood method, etc. These methods generally have solid and widely accepted mathematical foundations and can provide more insight in relationship among variables. However, the experience or knowledge is sometimes limited in certain research field. It is difficult for the traditional regression to determine the most approximate function model for a given data set. Moreover, the mathematical functions are built on strong assumptions which are sometimes not practically relevant to the real world.

Different with the traditional regression methods, symbolic regression (SR) can automatically establish suitable model of the numeric data set without the assumption of function forms. It is a data-driven method based on the extended genetic programming (GP), proposed by Cramer in 1985 [15] and developed by Koza [16]. It has been successfully utilized to define the hidden relationships in many fields. For instance, SR was demonstrated on four simulated and two real systems spanning mechanics, ecology and system biology [17]. Motion-tracking data was searched from various physical systems, and Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation were discovered [18]. Hubbert theory in oil production was modeled as Guassian distribution [19]. An accurate traffic speed prediction was built to generate significant information for travellers [17]. To our best knowledge, the work in [17] is the first to use this method in the field of transportation.

The Chinese automotive market has greatly grown over the past two and a half decades and the number of vehicles is expected to dramatically increase further. The medium- and long-term development plan of automobile industry issued by Ministry of Industry and Information Technology of the People’s Republic of China in 2017 forecasted that auto production will reach 30 million in 2020 and 35 million in 2050. Therefore, it is necessary to analyze and forecast the passenger car ownership in China. Different from the previous researches which assumes the relationship between vehicle ownership and economic factors as an S-shape function, this study automatically establishes the relation between the passenger car ownership and the gross domestic product (GDP) per capital by the data-driven method, SR. The newfound relation includes the Gompertz function as a special case and fits better than the traditional Gompertz function in the six selected countries whose automotive industry has entered the saturated period.

The remainder of this paper is organized as follows. The SR method and the traditional Gompertz function are briefly introduced in Section 2. The data sources are then presented in Section 3. Section 4 examines our approach on the synthetic data, proposes a novel vehicle ownership function for six representative countries and then applies the proposed function to predict and analyze the car ownership in China. Conclusions are finally drawn in Section 5.

2. Methodology

2.1. Symbolic Regression

The procedures of SR via GP mainly include four steps and the pseudo code is described in Algorithm 1.

(1) Step 1: Population initialization.

The typical representation of individual in SR is a parse tree, which generally has two types of nodes. They are leaf nodes and internal nodes. The leaf nodes consist of the terminal symbols, such as decision variables, constants or other problem parameters, and the internal nodes represent arithmetic functions, e.g., +, −, ×, ÷, e, ln, etc. Figure 1 shows an example of the tree structure in SR individual for the equation 1 + (x × y). Once the Function Set (FS) for internal nodes, the Symbol Set (SS) for leaf nodes, the Maximum Depth of Tree (DT) and Population M are determined, initial population of M trees are then randomly generated with FS, SS and DT.

Algorithm 1. Pseudo code of symbolic regression.

Input: set FS, SS, DT, TC, G, M, Pr, Pc, Pm
Output: Best expression

Generate initial population with FS, SS and M, and set gen = 0.
While gen < G
Calculate fitness of individuals;
i = 0;
While individual i < M,
If Randomly Probability < Pr,
Reproduction: Copy individual i into the next generation;
If Randomly Probability < Pc,
Crossover: Randomly recombine individual i and i + 1 to create new one into next generation;
If Randomly Probability < Pm,
Mutation: Mutate individual i randomly to create new individual into the next generation;
End while
Memorize the best solution achieved so far;
gen = gen + 1;
End while

Return best solution.

(2) Step 2: Fitness evaluation of individuals.

Each tree-structure individual in SR represents one corresponding function. The function value f(X_i) of each individual X_i can be obtained when one data sample (X_i, f_i) is given. Thus, the fitness of each individual is measured by mean absolute deviation (MAD), defined as in Equation (1).

MAD = \frac{1}{n} \sum_{i = 1}^{n} | f_{i} - f (X_{i}) |

(1)

where f_i is the actual value of data sample i, f(X_i) is the predicted function value by SR.

(3) Step 3: Individual revolution by genetic operators.

There are three common genetic operators in GP, which are reproduction, crossover and mutation. In the reproduction operator, the candidate solution is directly duplicated into the next generation. The crossover operator recombines two parents to generate a new child individual and the mutation operator randomly mutates a node of the chosen tree. Figure 2 illustrates the three genetic operators in detail [14].

Once the fitness of the population is calculated, the genetic operators are conducted to evolve model structures and parameters. Furthermore, the Reproduction Probability (Pr), Crossover Probability (Pc), Mutation Probability(Pm) and Maximum Generation (G) need to be determined to control the revolution for this step.

(4) Step 4: Best model selection among outputs.

SR generally returns a large number of models and the models with higher precisions tend to be selected. However, the more precise models are generally more complex, which are also more possible to be over-fitted. To balance the accuracy and complexity and efficiently control of over-fitting problem, a Pareto front is built to further select the best solution generated by SR, which is widely used in [19,20,21].

2.2. Gompertz Function

The increase of car ownership can be divided into three periods: slow-growth period at the low-level income, boom period as income rapidly rises and saturated period. Previous studies assumed the relationship between car ownership and economic factors (e.g., per-capita GDP or per-capita income) to follow some sort of S-shaped function, e.g., the logistic, the Richards functions and Gompertz function. The Gompertz function has been found to best fit the historical car ownership data [6,8,10]. Moreover, it has been proved in [6] that the Gompertz function can effectively describe the long-run relationship between the vehicle ownership and GDP per-capita and noted that using only GDP per capita, ignored other factors, can already substantially explain this relationship.

The basic function of Gompertz function is expressed as Equation (2)

Gompertz Function: y = α·exp(−β·exp(−γ·x))

(2)

where α is the ultimate saturation level of car ownership, β and γ are two parameters that determine the shape and curvature of S-curve. x is an economic indicator, denoting per-capita GDP and y denotes the long-run equilibrium level of the car/100 population ratio.

Then, the Equation (2) is converted to Equation (3) by taking the logarithmic operation on both sides

lny = lnα−β·exp(−γ·x)

(3)

If we let

\bar{α} = \ln α

and f(x) = lny for short, the equation above is then simplified as Equation (4).

f (x) = \bar{α} - β \cdot \exp (- γ \cdot x)

(4)

If we log-linearize the Equation (3), it can be transformed into Equation (5).

\ln (\ln \frac{α}{y}) = \ln (- β) - γ \cdot x

(5)

Then, ln⁡(−β) and γ can be regressed by OLS for time series data. Especially, the ultimate saturation level of car ownership in a certain country α should be known or assumed when the sample data are not included the inflection point.

3. Data

3.1. Selection of Case Countries

This study aims to use SR to search a model function of passenger car ownership without the assumption of the model form, and the future vehicle ownership in China will be estimated using the model obtained by SR. Six representative countries are selected as the sample data of SR, including United States, Japan, the United Kingdom, Finland, Poland and Australia. These countries are chosen because firstly, the increase of passenger car ownership in these countries has displayed a comparably smoother growth and even a saturated pattern. It means the increasing curves of the passenger car ownership in these countries contain an inflection point. Secondly, the data sources of the vehicle ownership in these countries are available for a long enough period, including all the three of slow-growth period, boom period and saturated period. Thirdly, these six countries are from North America, Asia, Europe and Oceania, which can represent different patterns of the passenger car ownership to some extent. Moreover, vehicle ownership in China remains in the rapid growth period and has not reached the inflection point of the increasing curve. It is necessary for the government policy makers and business managers to forecast the Chinese passenger car ownership. Therefore, the model found by SR is applied in the case of China.

3.2. Statistics Data

Population data for all countries are collected from World Population Prospects 2017 issued by United Nations. The data on GDP per capita for all countries are sourced from the World Bank, which have already converted GDP at each country to current U.S. dollars. The GDP per capita in China predicted for the period of the 2018–2060 is based on Organization for Economic Co-operation and Development (OECD) (2014) [21]. Various sources are used to obtain the data of historical passenger car ownership in the seven countries. For the USA, statistics in 1960–2009 are obtained from Nation Transportation Statistics compiled from the Bureau of Transportation Statistics. For the England, passenger car statistics data of 57 years from 1960 to 2016 derived from Vehicle Statistics held by the Department of Transport. For Japan, 56-year statistics during 1960–2015 are gathered from Japan Statistical Yearbook. For Finland, 57-year statistics in 1960–2016 are collected from the Statistics Finland. For Poland, we use the statistics during 1990–2015 from the database published by Central Statistical Office of Poland. For Australia, statistics from 1976 to 2016 are gathered from the Australian Bureau of Statistics, Australian Government. The data of Chinese car ownership in 1978–2015 are collected from China Statistical Year Book.

4. Experimental Results

In this section, we attempt to answer the following aspects of questions:

-: The validation aspect: can SR really find the Gompertz function underneath data?
-: The discovery aspect: is there any other mathematical function fitting better than Gompertz function for describing the relationship between economic factor and car ownership?
-: The prediction aspect: what is the trend of vehicle ownership in China based on the newfound function?

4.1. Validation on Synthetic Data

In order to verify the validation aspect, a Gompertz function is defined in advance and a set of data is generated according to this Gompertz model. Then, the data set is taken as the samples for SR. By learning from the synthetic data, it can be tested whether the SR can efficiently find the Gompertz function.

As mentioned in Section 2, the Gompertz function can be rewritten as Equation (4). Obviously, the Gompertz function in Equations (2) and (4) equals to each other. When

\bar{α}

, β and γ are estimated, the Gompertz function can be identified.

Here, we set α = 20, β = 3 and γ = 0.2 in Equation (2), and the Gompertz function can be expressed as

y = 20·exp⁡(−3·exp⁡(−0.2·x))

(6)

Equation (6) is then converted to Equation (7).

f(x) = 2.9957 − 3·exp⁡(−0.2·x)

(7)

Twenty data are randomly generated by Equation (7), which is the same as the parameter set in [22]. Taking them as input samples, the SR is conducted and it returns a huge number of models. The Pareto front of the returned models is shown as blue points in Figure 3. We can find in Figure 3 that SR can correctly discover Equation (7).

4.2. Model Discovery for the Six Representative Countries

In the discovery aspect, the SR is utilized to learn models with the data of the six representative countries. As a consequence, a series of functions are generated for each country. A generalized function is selected from the functions of the six countries by considering complexity and precision. The new equation by SR, named as new equation by symbolic regression (NE-SR), is finally compared against the Gompertz model obtained from SPSS. The accuracy differences between the proposed function and the Gompertz model can verify whether the new model is preferable than Gompertz model in predicting the relation of car ownership with the GDP per capita. Furthermore, the characteristics of NE-SR are analyzed as well.

4.2.1. New Function Discovery

The car ownership models of the six selected countries are established by SR depending on the data of ownership, population and GDP per capita introduced in Section 3.1. There are around 10 functions on Pareto front returned for each country and the results are shown in Table 1. Here, the fitness is calculated by the MAD in Equation (1) and the function complexity is the sum of the operator complexity, which is shown in Table 2.

It can be found from Table 1 that: (1) Gompertz function can be obtained by SR for each country except Finland, which means the Gompertz function is not suitable for all the countries; (2) A generalized model (NE-SR) written as

y^{'} = α^{'} \cdot \exp (θ^{'} \cdot x - β^{'} \cdot \exp (- γ^{'} \cdot x))

can be found for each country from the Pareto front. They are the 5th, 5th, 6th, 4th, 4th and 5th model for Japan, England, USA, Finland and Poland, respectively. It demonstrates that, depending on SR, it is a reliable model to describe the car ownership. However, it should be noted that the NE-SR is not found for Australia, which will be explained in the next section.

4.2.2. Validation of New Function

NE-SR model is proved to be a good choice to represent the growth of passenger car ownership from two aspects: first, by comparing the errors of NE-SR with the Gompertz function generated by SPSS; and second, by analyzing the characteristics of NE-SR and comparing the trends of passenger car ownership in each country.

The Gompertz function has been widely utilized to estimate the car ownership. Therefore, the new generalized function (NE-SR) is compared with the Gompertz function obtained from SPSS. The comparison results are shown in Table 3. It can be observed in Table 3 that the three errors of NE-SR such as MAD, mean absolute percentage error (MAPE) and root mean squared error (RMSE) for the five countries are all lower than those of Gompertz function. Here, the MAD, MAPE and RMSE are expressed as Equations (1), (8) and (9). Therefore, the NE-SR is of higher precision than the Gompertz function by SPSS.

MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{f_{i} - f (X_{i})}{f_{i}} |

(8)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} (f_{i} - f (X_{i}))^{2}}{n}}

(9)

Because the NE-SR is not found for Australia, the three errors of Gompertz function found by SR are compared against the Gompertz function by SPSS, which is also shown in Table 2. Therefore, it is proved that the precision of SR is better than the Gompertz function by SPSS for all the six representative countries.

The NE-SR for long-run car ownership y’ as a function of per-capita GDP can be written as:

y^{'} = α^{'} \cdot \exp (θ^{'} \cdot x - β^{'} \cdot \exp (- γ^{'} \cdot x))

where α′, β′ and γ′ are positive values.

Since

\lim_{x \to + \infty}

(

y^{'} - α^{'}

∙exp(

θ^{'}

∙x)) = 0, then

y_{a s y} = α^{'}

∙exp(

θ^{'}

∙x) is the asymptotic line of the NE-SR. Thus, the parameters

α^{'}

and

θ^{'}

determine the future trend of the vehicle ownership. Since the value of parameter

α^{'}

is positive, we only discuss the effect of the sign of the parameter

θ^{'}

on the passenger ownership in three scenarios. Figure 4 shows three examples of the NE-SR with different signs of

θ^{'}

.

(1): Scenario 1. When $θ^{'}$ = 0, the NE-SR is reduced to the Gompertz function. It means that the Gompertz function is a special form of the NE-SR, which is the reason that Table 1 obtained from SR only contains the Gompertz function for the passenger car ownership in Australia. In this scenario, the parameter $α^{'}$ denotes the saturation level for the long-run passenger car ownership. The transportation system achieves the relatively stable state since the car ownership rate is gradually close to $y_{a s y} = α^{'}$ .
(2): Scenario 2. When $θ^{'}$ > 0, instead of reaching the saturation level, the car ownership ratio will continue to slowly grows with the increase of per-capita GDP and infinitely approach to the function $y = α^{'}$ ∙exp $(θ^{'} \cdot x$ ) in the third period of car ownership. It is reasonable that people will continue to buy cars as the per-capita GDP grows, which further raises the car ownership ratio. It is supported by [14], which also stated that vehicle ownership slowly grows after the growth rate has reached its saturation level. Notably, the growth of car ownership ratio is limited because per-capita GDP cannot grow forever. The ownership in Japan, England, Finland and Poland are the examples for this scenario.
(3): Scenario 3. When $θ^{'}$ < 0, car ownership ratio will decrease with the growth in per-capita GDP in the third period of the vehicle ownership, which seems unusual to a certain extent. However, it has happened in some countries, e.g., USA. In USA, people choose the other travel modes instead of vehicles with the development of public transit, car sharing and bike highway and the increase in car parking fees. Figure 5 illustrates the comparison of car ownership ratio obtained from NE-SR and Gompertz function for USA.

4.3. Forecasting Chinese Vehicle Ownership

To conduct the prediction, the growth of Chinese passenger car ownership is assumed to follow the pattern of the six representative countries, respectively. It means China’s car ownership takes the same asymptotic line as the six countries. This assumption is reasonable because China’s industry development is generally imitating the mature experience of other developed countries. Actually, previous studies have also used the similar assumption. For instant, the patterns in OECD, Europe, UAS and Japan are discussed for Chinese vehicle ownership in [15]; the Europe and Japan pattern was separately taken as the high and medium patterns of the stock of private LDVs in China in [8]; and the ownership of highway, motor cycles and rural vehicles were assumed to follow different patterns of motor vehicle growth in Europe and Asia [23]. Six regression models are then obtained for China by SPSS and the most preferable one for each scenario mentioned in the previous section can be selected. Finally, the future passenger car ownership in China is forecasted based on each scenario.

Over the past two and a half decades, China has experienced great growth in the automotive market. The number of vehicles in the Chinese passenger car fleet is expected to dramatically increase and will match the current US car population by around 2020 [23]. Therefore, to further explain the new proposed function, it is applied into analyzing the increase of vehicle ownership in China. However, because passenger car ownership in China is still in the boom period, there are not enough data to fit the development of passenger car ownership due to the lack of inflection point data. Therefore, asymptotic line of the NE-SR is determined in advance according to the pattern of the six selected countries, which is similar with the treatment in the researches of Chinese car ownership using the Gompertz function [6,9,10].

The regressive results of the NE-SR for Chinese vehicle ownership are shown in Table 4. It can be seen that the pattern of Japan under the NE-SR fits Chinese passenger car ownership data best among all the patterns of positive parameter θ′ in the NE-SR, followed by these of England, Poland and Finland. Therefore, the pattern of England, Poland and Finland is not discussed here and the pattern of development trends in Japan, USA and Australia are utilized to forecast passenger car ownership because parameter θ′ of the NE-SR functions is respectively positive, negative and zero, which represent the three scenarios of car ownership.

Figure 6 shows the ownership ratios of passenger car in China calculated in the three patterns of Japan, USA and Australia. It can be found in Figure 6 that in the pattern of Japan, the ratio of passenger car ownership increases along with GDP per capita till GDP per capita reaches 50,000 dollars, which means Chinese passenger car ownership has not reached the saturated level. The passenger car ownership in the pattern of USA reaches the saturated level of 45.245 per 100 people around the GDP per capita of 38.402 dollars and then it decreases gradually. In the pattern of Australia, i.e., Gompertz function with α = 55.868, Chinese passenger car ownership grows fastest but will not enter the saturated period when GDP per capita reaches 50,000 dollars.

Figure 7 illustrates the corresponding projected Chinese passenger car ownership during the period of 2018–2060 based on the GDP per capita predicted in OECE (2014). Moreover, Chinese passenger car ownership will reach the saturation level of 400 and 500 vehicles per 100 people, also shown in Figure 7, which is assumed in [8]. The results in Figure 7 show that passenger car ownership continues to increase in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) till 2060. The level of passenger car ownership will be 39.561, 49.034 and 49.276 per 100 people, respectively. It means that growth of passenger car ownership in China has not reached the saturated level in 2060 for these three scenarios. For the pattern of Japan, the growth of the passenger car ownership is slow, and the ratio is only 33.565 in 2060. For the pattern of USA, the passenger car ownership enters the saturated period in 2060. The passenger car ownership will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060.

Among these five patterns, the growth in passenger car ownership is the slowest in the pattern of Japan and is the fastest in the Australia pattern (Gompertz function with α = 55.868). Furthermore, passenger car ownership per 100 people in China will not reach the saturation point by 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50. In the pattern of USA, the rate of passenger car ownership saturates around the year of 2057. It means that the Chinese passenger car stock still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, and at least grows until 2057 under all patterns.

5. Conclusions

This paper utilizes a data-driven symbolic regression method to describe the future development of car ownership, which has its own advantage in automatically establishing suitable models of the numeric data set without assuming function forms. A generalized function has been found to better fit the trend of passenger car ownership in six representative countries (Japan, England, USA, Finland, Poland and Australia) and the traditional Gompertz function is a special pattern of this new function. Moreover, three scenarios of the car ownership are obtained depending on the alternative signs of the parameters in the new function. These scenarios represent the patterns of Japan, USA and Australia (Gompertz function), respectively. Finally, the patterns of these three countries are applied into analyzing passenger car ownership in China. The predicted results are compared against two Gompertz functions of car ownership in the previous research and show that the Chinese passenger car will reach 39.561, 49.034 and 49.276 per 100 people in 2060, respectively in the pattern of Gompertz function (α = 40 and α = 50) and Australia (α = 55.868) but will increase to 45.245 around the year 2057 and then decrease slowly to 45.173 per 100 people in 2060 in the pattern of UAS. It means that Chinese passenger car ownership still has a potential to increase after 2060 in the pattern of Japan, Australia and Gompertz functions with α = 40 and α = 50, but grows until around 2057 under all patterns at least. In the future, the new function will be used to forecast the ownership, energy demand and carbon emissions of various vehicles in the transportation industry, and the predicted results can be used as a basis for the policymaker to propose transportation strategies.

Author Contributions

Conceptualization, L.L.; Methodology, W.T.; Software, W.T.; Formal Analysis, L.L.; Data Curation, M.Z.; Writing-Original Draft Preparation, L.L.; Writing-Review & Editing, H.X.; Funding Acquisition, H.X.

Funding

This work was partially supported by the National Natural Science Foundation of China [grant numbers: 51578111, 61374193].

Acknowledgments

The authors gratefully acknowledge the useful suggestions given by Guangfei Yang and Bin Niu in Dalian University of Technology.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chamon, M.; Mauro, P.; Okawa, Y. Mass car ownership in the emerging market giants. Econ. Policy 2008, 54, 243–296. [Google Scholar] [CrossRef]
Acker, V.V.; Witlox, F. Car ownership as a mediating variable in car travel behaviour research using a structural equation modelling approach to identify its dual relationship. J. Transp. Geogr. 2010, 18, 65–74. [Google Scholar] [CrossRef] [Green Version]
De Jong, G.; Fox, J.; Daly, A.; Pieters, M.; Smit, R. Comparison of car ownership models. Transp. Rev. 2004, 24, 379–408. [Google Scholar] [CrossRef] [Green Version]
Qian, L.; Soopramanien, D. Using diffusion models to forecast market size in emerging markets with applications to the Chinese car market. J. Bus. Res. 2014, 67, 1226–1232. [Google Scholar] [CrossRef]
Carlucci, F.; Cirà, A.; Lanza, G. Hybrid electric vehicles: Some theoretical considerations on consumption behaviour. Sustainability 2018, 10, 1302. [Google Scholar] [CrossRef]
Dargay, J.; Gately, D. Income’s effect on car and vehicle ownership, worldwide: 1960–2015. Transp. Res. A Policy Pract. 1999, 33, 101–138. [Google Scholar] [CrossRef]
Dargay, J.; Gately, D.; Sommer, M. Vehicle ownership and income growth, worldwide: 1960–2030. Energy J. 2007, 28, 143–170. [Google Scholar] [CrossRef]
Huo, H.; Wang, M. Modeling future vehicle sales and stock in China. Energy Policy 2012, 43, 17–29. [Google Scholar] [CrossRef]
Zeng, Y.; Tan, X.; Gu, B.; Wang, Y.; Xu, B. Greenhouse gas emissions of motor vehicles in Chinese cities and the implication for China’s mitigation targets. Appl. Energy 2016, 184, 1016–1025. [Google Scholar] [CrossRef]
Das, D.; Sharfuddin, A.; Datta, S. Personal Vehicles in Delhi: Petrol Demand and Carbon Emission. Int. J. Sustain. Transp. 2009, 3, 122–137. [Google Scholar] [CrossRef]
Chen, X.; Zhang, H. Evaluation of Effects of Car Ownership Policies in Chinese Megacities Beijing and Shanghai. Transp. Res. Rec. 2012, 2317, 32–39. [Google Scholar] [CrossRef]
Wu, T.; Zhang, M.; Ou, X. Analysis of Future Vehicle Energy Demand in China Based on a Gompertz Function Method and Computable General Equilibrium Model. Energies 2014, 7, 7454–7482. [Google Scholar] [CrossRef] [Green Version]
Wu, T.; Zhao, H.; Ou, X. Vehicle Ownership Analysis Based on GDP per Capita in China: 1963–2050. Sustainability 2014, 6, 4877–4899. [Google Scholar] [CrossRef] [Green Version]
Lu, H.; Ma, H.; Sun, Z.; Wang, J. Analysis and Prediction on Vehicle Ownership Based on an Improved Stochastic Gompertz Diffusion Process. J. Adv. Transp. 2017, 2017. [Google Scholar] [CrossRef]
Cramer, N.L. A representation for the adaptive generation of simple sequential programs. In Proceedings of the 1st International Conference on Genetic Algorithms and Their Applications, Hillsdale, NJ, USA, 24–26 July 1985. [Google Scholar]
Koza, J. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press: Cambridge, MA, USA, 1992; p. 222. [Google Scholar]
Bongard, J.; Lipson, H. Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 2007, 104, 9943–9948. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schmidt, M.; Lipson, H. Distilling Free-Form Natural Laws from Experimental Data. Science 2009, 324, 81–85. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yang, G.; Li, X.; Wang, J.; Lian, L.; Ma, T. Modeling oil production based on symbolic regression. Energy Policy 2015, 82, 48–61. [Google Scholar] [CrossRef]
Li, L.; Tomislav, F.; Zhang, J.; Ran, B. Traffic Speed Prediction for Highway Operations Based on a Symbolic Regression Algorithm. Promet Traffic Transp. 2017, 29, 433–441. [Google Scholar] [CrossRef] [Green Version]
Organization for Economic Co-operation and Development (OECD). OECD Economic Outlook: Statistics and Projections (Database); Long-Term Baseline Projections, No. 95 (Edition 2014); Organization for Economic Co-operation and Development (OECD): Paris, France, 2014. [Google Scholar] [CrossRef]
Karaboga, D.; Ozturk, C.; Karaboga, N.; Gorkemli, B. Artificial bee colony programming for symbolic regression. Inf. Sci. 2012, 209, 1–15. [Google Scholar] [CrossRef]
Huo, H.; Wang, M.; Johnson, L.; He, D. Projection of Chinese motor vehicle growth, oil demand, and CO₂ emissions through 2050. Transp. Res. Rec. 2007, 2038, 69–77. [Google Scholar] [CrossRef]

Figure 1. An example of the tree structure in genetic programming (GP) individual.

Figure 2. An example of three genetic operators in GP individual. (a) Reproduction; (b) Crossover; (c) Mutation.

Figure 3. Pareto front models discovered for the synthetic data.

Figure 4. Examples of new function with α′ = 2.1, β′ = 3.953, γ′ = 0.239 and three alternative values of

θ^{'} .

Figure 4. Examples of new function with α′ = 2.1, β′ = 3.953, γ′ = 0.239 and three alternative values of

θ^{'} .

Figure 5. Passenger car ownership ratio in USA obtained from new equation by symbolic regression (NE-SR) and Gompertz function.

Figure 6. Trend of Chinese passenger car ownership obtained from the NE-SR functions in the pattern of Japan, USA and Australia.

Figure 7. Projected Chinese passenger car ownership during the period of 2018–2060.

Table 1. Vehicle ownership models of the six selected countries generated by symbolic regression.

No.	Complexity	Fitness	Function
Japan
1	1	9.9056	exp(3.137)
2	5	4.7483	exp(2.343 + 0.031·x)
3	9	3.7509	exp(3.420−exp(−0.078·x))
4	11	2.5623	exp(3.433−3.713·exp(−0.312·x))
5	15	1.7238	exp(2.735 + 0.0191·x−5.004·exp(−0.906·x))
6	21	1.1385	exp(3.451−1.572·exp(−0.113·x)−5.505·exp(−1.469·x))
7	33	0.8801	exp(1.262 + 0.103·x + 0.381·x·exp(−0.161·x)−0.00121·x^2−5.286·exp(−1.779·x))
8	45	0.8618	exp(3.807−2.587 × 10⁻⁶·x^3−2.490·exp(−0.164·x)−5.540·exp(−1.822·x)−0.00187·x^3·exp(−0.164·x))
9	53	0.8613	exp(3.806−2.597 × 10⁻⁶·x^3−2.496·exp(−0.165·x)−0.00188·x^3·exp(−0.165·x)−5.540·exp(−1.822·x−0.00756·x^3))
10	57	0.8574	exp(3.805−2.565 × 10⁻⁶·x^3−2.516·exp(−0.165·x)−0.00188·x^3·exp(−0.165·x)−5.513·exp(−1.833·x−0.007·x^5))
England
1	1	10.4284	exp(3.594)
2	5	4.1432	exp(3.192 + 0.018·x)
3	9	1.7282	exp(3.909−exp(−0.074·x))
4	11	1.6742	exp(3.90−1.104·exp(−0.086·x))
5	15	1.5065	exp(3.383 + 0.012·x−1.787·exp(−0.574·x))
6	21	0.8251	exp(3.103 + 0.035·x−0.000397·x^2−6.545·exp(−1.569·x))
7	29	0.7797	exp(3.067 + 0.043·x + 4.262 × 10⁻⁶·x^3−0.000752·x^2−7.250·exp(−1.664·x))
8	31	0.7398	exp(3.231 + 0.027·x−0.000285·x^2−0.310·exp(−0.222·x)−11.451·exp(−2.063·x))
9	33	0.7329	exp(2.926 + 0.039·x + 0.051·x·exp(−0.107·x)−0.000394·x^2−11.451·exp(−2.063·x))
10	35	0.7212	exp(2.976 + 0.040·x + 0.022·x·exp(−0.00548·x^2)−0.000435·x^2−11.071·exp(−2.018·x))
USA
1	1	2.6715	exp(3.868)
2	5	2.5630	exp(3.902−0.00119·x)
3	9	1.5712	exp(3.878−exp(−0.363·x))
4	11	1.4029	exp(3.877−1.830·exp(−0.523·x))
5	13	0.7655	exp(4.004−0.004·x −exp(−0.273·x))
6	15	0.5831	exp(3.996−0.00379·x−1.496·exp(−0.373·x))
7	21	0.4778	exp(4.128 + 0.000117·x^2−0.012·x−1.389·exp(−0.281·x))
8	23	0.4614	exp(3.829 + 0.145·x·exp(−0.141·x)−0.931·exp(−0.137·x))
9	25	0.4416	exp(3.879 + 0.143·x·exp(−0.156·x)−exp(−0.150·x)−0.00104·x)
10	29	0.4021	exp(5.427 + 0.00283·x^2−0.113·x−0.0000244·x^3−2.428·exp(−0.139·x))
11	37	0.3178	exp(2.273 + 0.067·x + 0.722·x·exp(−0.144·x)−0.000738·x^2−0.094·x^2·exp(−0.269·x))
12	47	0.3051	exp(3.057 + 0.00156·x^2 + 0.121·x^2·exp(−0.437·x) + 0.030·x^2·exp(−0.141·x)−0.015·x−0.0000203·x^3)
13	49	0.3043	exp(1.943 + 0.0808·x + 0.746·x·exp(−0.140·x) + 0.247·x·exp(−0.286·x)−0.000895·x^2−0.122·x^2·exp(−0.286·x))
Finland
1	1	13.1614	exp(3.597)
2	5	5.3372	exp(2.959 + 0.023·x)
3	11	3.5604	exp(2.624 + 0.061·x−0.000689·x^2)
4	15	2.3876	exp(3.184 + 0.018·x−5.579·exp(−0.957·x))
5	21	1.9225	exp(2.858 + 0.0417·x−0.000374·x^2−7.575·exp(−1.323·x))
6	29	1.7453	exp(2.702 + 0.070·x + 0.0000150·x^3−0.00163·x^2−13.959·exp(−1.846·x))
7	31	1.7304	exp(2.702 + 0.070·x + 0.0000150·x^3−0.00163·x^2−20.542·x·exp(−2.352·x))
8	33	1.7270	exp(2.635 + 0.082·x + 0.0000209·x^3−0.00213·x^2−3.043·x∗exp(−0.725·x^2))
9	43	1.5750	exp(2.537 + 0.116·x + 0.000113·x^3−9.025e−07·x^4−0.00511·x^2−3.061·x·exp(−0.763·x^2))
10	45	1.5404	exp(2.523 + 0.114·x + 0.0000726·x^3−6.920e−09·x^5−0.00444·x^2−3.072·x·exp(−0.775·x^2))
11	47	1.5053	exp(2.573 + 0.099·x + 1.512 × 10⁻⁶·x^4−1.839e−08·x^5−0.00289·x^2−3.054·x·exp(−0.752·x^2))
12	49	1.4012	exp(2.600 + 0.079·x + 4.584 × 10⁻⁶·x^4−4.031e−08·x^5−0.000149·x^3−3.099·x·exp(−0.758·x^2))
Poland
1	1	10.5362	$\exp (3.372)$
2	5	2.5405	$\exp (2.746 + 0.086 \cdot x)$
3	11	1.9455	$\exp (4.038 - 1.816 \cdot \exp (- 0.166 \cdot x))$
4	13	1.7844	$\exp (2.991 + 0.067 \cdot x - \exp (- 0.462 \cdot x))$
5	15	1.7548	$\exp (3.067 + 0.061 \cdot x - 1.239 \cdot \exp (- 0.476 \cdot x))$
6	17	1.7191	$\exp (3.413 + 0.00267 \cdot x^2 - 1.541 \cdot \exp (- 0.385 \cdot x))$
7	21	1.7190	$\exp (3.413 + 7.466 \times 10^{- 6} \cdot x + 0.00267 \cdot x^2 - 1.541 \cdot \exp (- 0.385 \cdot x))$
8	27	1.6692	$\exp (2.031 + 0.423 \cdot x + 0.00176 \cdot x^3 - 1.811 \times 10^{- 7} \cdot \exp (x) - 0.044 \cdot x^2)$
9	39	1.4186	$\exp (2.165 + 0.317 \cdot x + 0.0000263 \cdot \exp (x) + 1.148 \times 10^{- 12} \cdot \exp (1.995 \cdot x) - 0.019 \cdot x^2 - 0.00000191 \cdot x \cdot \exp (x))$
10	43	1.3147	$\exp (2.163 + 0.316 \cdot x + 0.0000153 \cdot \exp (x) + 1.133 \times 10^{- 12} \cdot \exp (0.0000158 \cdot \exp (x)) - 0.019 \cdot x^2 - 1.041 \times 10^{- 6} \cdot x \cdot \exp (x))$
11	53	1.3027	$\exp (2.189 + 0.302 \cdot x + 0.0000127 \cdot \exp (x) + 1.789 \times 10^{- 12} \cdot \exp (0.0000158 \cdot \exp (x)) - 0.017 \cdot x^2 - 8.323 \times 10^{- 7} \cdot x \cdot \exp (x) - 3.580 \times 10^{- 13} \cdot \exp (1.995 \cdot x))$
12	55	1.2004	$\exp (2.004 + 0.443 \cdot x + 0.002 \cdot x^3 + 1.858 \times 10^{- 11} \cdot \exp (0.0000157 \cdot \exp (x)) - 2.985 \times 10^{- 7} \cdot \exp (x) - 0.048 \cdot x^2 - 4.114 \times 10^{- 22} \cdot \exp (0.017 \cdot x^3))$
Australia
1	1	4.3809	$\exp (3.977)$
2	5	2.6312	$\exp (3.862 + 0.00308 \cdot x)$
3	9	0.8382	$\exp (4.022 - \exp (- 0.110 \cdot x))$
4	11	0.8353	$\exp (4.023 - 0.958 \cdot \exp (- 0.108 \cdot x))$
5	29	0.7929	$\exp (3.189 + 0.069 \cdot x + 3.289 \times 10^{- 5} \cdot x^3 - 1.788 \times 10^{- 7} \cdot x^4 - 0.00225 \cdot x^2)$
6	31	0.7752	$\exp (3.193 + 0.067 \cdot x + 2.280 \times 10^{- 5} \cdot x^3 - 1.086 \times 10^{- 9} \cdot x^5 - 0.00202 \cdot x^2)$
7	33	0.7731	$\exp (3.212 + 0.063 \cdot x + 1.758 \times 10^{- 5} \cdot x^3 - 8.542 \times 10^{- 12} \cdot x^6 - 0.00177 \cdot x^2)$
8	41	0.7727	$\exp (3.212 + 0.063 \cdot x + 1.758 \times 10^{- 5} \cdot x^3 + \exp (- 1.086 \cdot x) - 8.541 \times 10^{- 12} \cdot x^6 - 0.00177 \cdot x^2)$
9	43	0.7727	$\exp (3.212 + 0.063 \cdot x + 1.758 \times 10^{- 5} \cdot x^3 + 81.550 \cdot \exp (- 1.674 \cdot x) - 8.541 \times 10^{- 12} \cdot x^6 - 0.00177 \cdot x^2)$

Table 2. Complexity setting for operators.

Name	Complexity
Constant	1
Input Variable	1
Addition	1
Subtraction	1
Multiplication	1
Division	2
Exponential	4
Natural Logarithm	4
Sine	3
Cosine	3

Table 3. Comparison results between new pattern and Gompertz function by SPSS.

		Function	MAD	MAPE	RMSE
Japan	NE-SR	exp(2.735 + 0.019∙x − 5.004∙exp(−0.906∙x))	1.724	0.091	2.455
	Gompertz	31.13∙exp(−2.291∙exp(−0.151∙x))	1.814	0.47	2.31
USA	NE-SR	exp(3.996 − 0.004∙x−1.496∙exp(−0.373∙x))	0.583	0.012	1.005
	Gompertz	48.654∙exp(−2.178∙exp(−0.565∙x))	1.498	0.031	2.014
England	NE-SR	exp(3.383 + 0.012∙x − 1.787∙exp(−0.574∙x))	1.506	0.051	1.956
	Gompertz	48.946∙exp(−1.156∙exp(−0.089∙x))	1.761	0.078	2.297
Finland	NE-SR	exp(3.184 + 0.018∙x − 5.579∙exp(−0.957∙x))	2.388	0.076	3.428
	Gompertz	57.167∙exp(−1.685∙exp(−0.065∙x))	3.13	0.185	3.764
Poland	NE-SR	exp(3.067 + 0.061∙x − 1.239∙exp(−0.476∙x))	1.755	0.051	2.741
	Gompertz	64.848∙exp(−1.83∙exp(−0.133∙x))	1.983	0.059	2.764
Australia	NE-SR	exp(4.023 − 0.958∙exp(−0.108∙x))	0.835	0.017	1.198
	Gompertz	50.799∙exp(−1.706∙exp(−0.453∙x))	4.516	0.094	5.362

Table 4. Chinese passenger car ownership under the NE-SR pattern of six countries.

Pattern	Function	MAD	MAPE	RMSE
Japan	15.41∙exp(0.019∙x − 3.884∙exp(−0.226∙x))	0.262	1.392	0.292
England	29.459∙exp(0.012∙x − 4.289∙exp(−0.160∙x))	0.305	1.774	0.338
USA	54.38∙exp(−0.004∙x − 4.819∙exp(−0.132∙x))	0.323	1.925	0.364
Poland	21.477∙exp(0.061∙x − 3.848∙exp(−0.14∙x))	0.341	2.007	0.386
Finland	24.143∙exp(0.018∙x − 3.469∙exp(−0.144∙x))	0.476	3.492	0.548
Australia	55.868∙exp(0∙x − 4.093∙exp(−0.085∙x))	0.857	4.260	1.033

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lian, L.; Tian, W.; Xu, H.; Zheng, M. Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression. Sustainability 2018, 10, 2275. https://doi.org/10.3390/su10072275

AMA Style

Lian L, Tian W, Xu H, Zheng M. Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression. Sustainability. 2018; 10(7):2275. https://doi.org/10.3390/su10072275

Chicago/Turabian Style

Lian, Lian, Wen Tian, Hongfeng Xu, and Menglan Zheng. 2018. "Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression" Sustainability 10, no. 7: 2275. https://doi.org/10.3390/su10072275

APA Style

Lian, L., Tian, W., Xu, H., & Zheng, M. (2018). Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression. Sustainability, 10(7), 2275. https://doi.org/10.3390/su10072275

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling and Forecasting Passenger Car Ownership Based on Symbolic Regression

Abstract

1. Introduction

2. Methodology

2.1. Symbolic Regression

2.2. Gompertz Function

3. Data

3.1. Selection of Case Countries

3.2. Statistics Data

4. Experimental Results

4.1. Validation on Synthetic Data

4.2. Model Discovery for the Six Representative Countries

4.2.1. New Function Discovery

4.2.2. Validation of New Function

4.3. Forecasting Chinese Vehicle Ownership

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI