1. Introduction
Reducing CO
2 emissions requires engagement at multiple levels. In addition to institutional actions, changes at the local level, particularly at the lowest levels of the social structure, are also necessary [
1]. The individual behaviors of vehicle users have a significant impact on global CO
2 emissions. Daily operation of private cars, driving style, and attention to fuel efficiency contribute to cumulative effects that can influence the transition towards more sustainable transport [
2]. In light of this, it becomes justified to analyze the factors affecting CO
2 emissions from motor vehicles, which is the focus of this article.
The aim of the study is to analyze the impact of selected dynamic vehicle parameters on the adopted level of CO2 emissions depending on the vehicle’s driving modes, i.e., starting, urban driving, and highway driving, using the mentioned modeling technique based on data from the OBD system, collected under real driving conditions.
This has led to the following contributions:
- Identification of key dynamic vehicle parameters influencing CO2 emissions depending on the driving mode used. 
- Comparison and assessment of CO2 emissions across different driving modes, enabling the determination of which modes are characterized by the highest environmental burden and how they can be optimized. 
- Proposal of a modeling technique that reflects real vehicle operating conditions and provides an understanding of how dynamic parameters affect CO2 emissions under various road conditions. 
- Development of a method that allows for optimizing fuel consumption and CO2 emissions at the individual user level through better management of velocity and driving fluidity. 
The structure of the article is as follows. After introducing the subject based on current knowledge and the authors’ experience, the issue of Poland’s emissions (the country of study) is presented in the context of the EU. Next, the research methods used in the article are presented. The most important element of the paper is the mathematical analysis of the collected empirical data, followed by the presentation of logistic regression models for the three driving modes, along with their evaluation and comparison. The article concludes with a discussion of the results and the presentation of final conclusions.
  2. Greenhouse Gas Emissions in the European Union (EU)
Road transport is one of the sectors that plays a significant role in greenhouse gas emissions within the European Union (EU). CO
2 emissions from this sector pose a substantial challenge in the fight against climate change, accounting for approximately 30% of the total CO
2 emissions in the EU [
3]. A striking 70% of these emissions come from road transport, with passenger cars responsible for 61% of that figure [
4]. Consequently, efforts to reduce these emissions are the focus of numerous regulations introduced by the EU.
In 2021, there were approximately 250 million registered vehicles across the EU, with the average number of vehicles per country at about 10 million. 
Figure 1 shows the countries between 2017 and 2021 that had a number of registered vehicles above the EU average, namely Germany, France, Spain, Italy, and Poland. Additionally, a slight increase in the number of vehicles can be observed in Poland during this period.
When examining the number of vehicles older than 10 years in the aforementioned EU countries (
Figure 2), it can be observed that Italy leads with approximately 23 million vehicles, representing 59% of all vehicles in the country, compared to the EU average of 5.75 million vehicles. Poland ranks third, with around 21 million vehicles, which constitutes a striking 80% of all vehicles on Polish roads. Thus, the vehicle fleet in the EU is not among the newest.
This is confirmed by the average age of vehicles in the EU, which stands at 12 years. Greece has the oldest vehicles, with an average age of 17 years, while Luxembourg has the newest, with an average age of 7.6 years. The average vehicle age in Germany and France is around 10 years (
Figure 3). In Poland, the average age of vehicles is approximately 14 years, which is higher than the EU average. This is significant, as the age of vehicles has a substantial impact on emissions. Older vehicles often do not meet modern standards, which can lead to higher pollution emissions compared to newer models. Therefore, the technical condition and reliability of vehicles are crucial factors as well [
6,
7]. In line with this, starting in 2035, the EU will enforce a ban on the sale of new passenger cars and delivery vehicles powered by internal combustion engines using traditional fuels (LPG gas, diesel, gasoline types 95 and 98) [
3,
8]. The long-term goal is to reduce greenhouse gas emissions from transport by 90% by 2050, as compared to 1990 levels [
9]. Achieving these targets requires comprehensive changes across all aspects of motor vehicles. It is worth noting that transport is the only sector where greenhouse gas emissions increased between 1990 and 2019, by approximately 34% [
4]. In Poland, emissions from transport rose 2.5 times between 1990 and 2016 [
10].
  3. CO2 Emissions in Motor Vehicles in the Context of Selected Factors
CO
2 emissions in motor vehicles are a complex process directly related to fuel combustion. From burning one liter of gasoline, approximately 2.3 kg of CO
2 is produced, while burning diesel generates about 2.7 kg of CO
2 [
11]. A review of the literature indicates that CO
2 emissions are influenced by several factors, including:
- vehicle characteristics, such as vehicle parameters (mass, velocity, etc.) [ 12- ], type and specifications of the internal combustion engine [ 13- ], vehicle type [ 14- ], and vehicle profile (aerodynamics) [ 15- ], 
- type of fuel, such as gasoline, diesel, and alternative fuels [ 16- , 17- , 18- ], 
- environmental conditions, such as terrain, ambient temperature, and weather conditions [ 11- ], 
- driver’s driving style [ 19- ]. 
This article focuses on dynamic (instantaneous) parameters related to the vehicle and its engine, including vehicle velocity, engine load, engine speed (rpm), engine power, torque, and throttle position. Vehicle velocity is one of the key kinematic parameters in the context of CO
2 emissions, and along with engine speed, it can be monitored and controlled by any driver while operating a vehicle [
20]. The optimal vehicle velocity, at which CO
2 emissions are minimized, depends on various factors, including the specific vehicle model. One article [
21] demonstrates that driving at a velocity of 90 km/h, compared to 110 km/h, can reduce CO
2 emissions by approximately 23% (as well as fuel consumption). For most combustion engine vehicles, CO
2 emissions increase sharply at velocities exceeding around 100 km/h, particularly above 120 km/h [
22]. This is partly due to the fact that air resistance increases with the square of the vehicle’s velocity. When it comes to engine speeds, revolutions exceeding around 3000 RPM typically lead to higher fuel consumption and CO
2 emissions [
22]. This happens because an internal combustion engine, when running at higher RPMs, burns more fuel, directly resulting in greater CO
2 emissions [
23]. However, CO
2 emissions at maximum engine speeds can vary depending on the engine’s type and size [
24]. Additional factors influencing CO
2 emissions include engine power and torque. Engine power determines the engine’s capacity to perform work and is the product of engine speed and torque. The trend of increasing engine power in passenger vehicles is one of the reasons why CO
2 emissions from road transport in the EU remain high [
25]. On the other hand, high torque at low engine speeds allows for dynamic acceleration without the need for the engine to operate at high RPMs, which can lead to reduced fuel consumption and CO
2 emissions. Another important factor is engine load, which reflects the intensity of the engine’s work relative to its maximum output and influences fuel combustion efficiency [
26]. Drivers who tend to accelerate aggressively cause rapid changes in throttle position, which controls the amount of air delivered to the engine. This, in turn, directly affects the air-fuel mixture, leading to increased CO
2 emissions [
19]. The aforementioned parameters are interdependent. The impact of vehicle velocity on CO
2 emissions can vary depending on the engine load and its speed (rpm). Additionally, it is essential to account for differences between various vehicle types and the technologies applied in their engines.
The dynamic parameters discussed above reach different values depending on the specific driving cycle of the vehicle [
19]. From the perspective of fuel consumption and CO
2 emissions, vehicle usage can be divided into three distinct stages: engine startup, urban driving, and highway driving [
27]. Each of these stages is characterized by different fuel consumption levels and emissions. The engine startup phase is a short-term stage during which fuel consumption is particularly high. Before the internal combustion engine reaches its optimal operating temperature, it can emit higher levels of CO
2 [
27]. Engine startup is a transient state, and its emission levels depend on many factors, including ambient temperature and the length of time the vehicle has been idle before starting [
28]. Urban driving involves operating the vehicle in city traffic conditions, characterized by frequent stops and starts, leading to higher fuel consumption and CO
2 emissions compared to highway driving. On highways, velocity limits are generally higher, and the distance covered in one go is longer. This results in lower fuel consumption and lower CO
2 emissions with respect to urban driving [
29]. In turn, satellite navigation systems (GNSS—Global Navigation Satellite Systems), by providing precise location data, also contribute to optimizing vehicle velocity and reducing CO
2 emissions in road transport, regardless of the driving mode [
30].
Another important issue is the accuracy of CO
2 emission measurements, which are crucial for assessing the environmental impact of motor vehicles. The current testing procedure in Europe is the WLTP (Worldwide Harmonized Light Vehicles Test Procedure), introduced in 2017, which replaced the older NEDC (New European Driving Cycle) standard [
20]. WLTP takes into account various vehicle configurations that affect mass and aerodynamics. The test lasts about 30 min, covering a distance of approximately 23 km [
31]. However, this is a laboratory method. In contrast, for real-world measurements, RDE (Real Driving Emissions) testing is performed using PEMS (Portable Emissions Measurement Systems) mounted on vehicles [
32]. These devices enable emissions measurement during normal vehicle operation in various road and climatic conditions. They allow for the identification of discrepancies between laboratory results and actual emissions [
32].
In addition, the studies described in [
33,
34] provide important data on road transport emissions from real case studies and can be used in the modeling of CO
2 emissions from combustion engine vehicles. Furthermore, the analysis of emission reductions during the COVID-19 lockdown presented in [
35] can in turn provide a basis for assessing the impact of different road transport scenarios on GHG emissions.
It is also important to consider the tools available for analyzing CO
2 emission levels. Classification algorithms based on machine learning algorithms are frequently used tools in the analysis of CO
2 emission data. The most frequently mentioned algorithms used for this purpose are XGBoost, Decision Trees, Random Forest, KNN, SVM, Stochastic gradient descent, and Naive Bayes [
36,
37]. The algorithms that stand out in terms of accuracy are Random Forest, KNN, and SVM, which in [
36] show the best results of goodness of fit indicators. In turn, in the case of the research conducted in [
37], the XGBoost algorithm showed the highest accuracy among the methods tested. Decision trees were used in [
38] to identify various factors (demographic, socioeconomic, car or bicycle ownership, etc.) affecting CO
2 emissions. In [
39], a random forest model was employed to predict CO
2 emissions and fuel consumption by vehicles in developing urban areas. Furthermore, the application of SVMs for predicting CO
2 emissions in cars using geographic information systems was described in [
40,
41]. Another method used for this purpose is logistic regression, which is rarely mentioned in this series mainly due to differences in the mathematical model of the method. Algorithms with a more complex mathematical model, e.g., SVM or XGBoost, ultimately enable the analysis of large datasets, which are often given directly, and often without any data aggregation. The innovative approach proposed in this paper is to aggregate data into three subsets of driving cycles of the vehicle containing characteristic emission features and then use logistic regression to classify the data grouped in this way. This approach allows for the use of a less complex (in terms of the mathematical model) classification method, which has a number of significant advantages over other algorithms for this purpose.
Logistic regression is a widely used statistical method for modeling the probability of a binary outcome based on usually few predictor variables. Unlike linear regression (continuous outcomes), logistic regression is dedicated for classification tasks where the goal is to categorize observations into distinct classes. This approach has been demonstrated, for example, in [
42], where a logistic regression model was applied to study CO
2 emissions from 2000 spark-ignition engine vehicles (SI), showing that vehicle age is the most significant factor influencing emission levels. Another article, [
43], analyzed data from vehicle inspection and maintenance, using logistic regression to determine factors influencing emissions. The results indicate that vehicle age, average fuel consumption, mileage, engine operating parameters, vehicle mass, and overall technical condition are strong determinants of emissions, which was also confirmed in [
44]. The most commonly used data that form the basis of logistic regression models are the previously mentioned general or technical parameters of vehicles or other statistical data related to them. Another innovation of this work is the use of data directly derived from the OBD systems of the car (instantaneous values of dynamic parameters) to build a model whose purpose is to determine whether the level of CO
2 emissions is acceptable in terms of the adopted reference level. Such an innovative approach allows for wide use of the potential of the logistic regression method.
In conclusion, there are many different methods for identifying key factors influencing CO2 emissions, e.g., machine learning algorithms. However, logistic regression also allows for estimating the probability of exceeding a specified CO2 emission level based on vehicle parameters, and for modeling the non-linear relationships between these parameters. For these reasons and others previously mentioned, logistic regression was chosen for this study.
  5. Results
  5.1. Logistic Regression Model During Vehicle Start-Up
The dependent variable adopted for the study is dichotomous in nature. To determine the threshold value for the “ecological level of CO
2 emissions”, a value of 200 g/km was accepted. A value of 0 was assigned when the emission standard was exceeded, and 1 in the opposite case. The adopted emission value of 200 g/km results from the analysis of the average age of vehicles in Poland, which is approximately 14.5 years. This means that a significant portion of the cars in use were manufactured in 2009. Therefore, the emission standards that were in force at that time are specified in Regulation (EC) No. 443/2009 of the European Parliament and of the Council of 23 April 2009, which specifies emission standards for new passenger cars as part of the European Union’s integrated approach to reducing CO
2 emissions from light-duty vehicles [
49]. This regulation sets a mentioned CO
2 emission limit of 200 g/km for manufacturers.
The logistic regression model for CO
2 emissions during vehicle start-up was developed first. A univariate analysis of the impact of selected quantitative variables on the modeled variable was conducted. The linearity test was utilized for this purpose. The results of the test are presented in 
Table 3.
The test results indicate that, at the adopted significance level of 
, all variables should be included in the model’s construction. The estimated parameter values of the model, along with the results of the Wald test (the calculated Wald statistic and the corresponding 
p-value) for their significance, are presented in 
Table 4.
At the adopted significance level of , the calculated p-values for the variables of engine power, torque, and engine speed were , , and , respectively. This indicates that there is no basis for rejecting the null hypothesis for these variables; therefore, they are not statistically significant in the model. For the remaining variables, the p-value was , which is why they were included in the model construction.
The model equation takes the following forms (9) and (10):
        where
        
 represents vehicle velocity,  represents relative engine load, and  represents throttle position.
In the next step, the goodness of fit of the model was tested based on a comparison of predicted and observed values. For this purpose, the Hosmer-Lemeshow (HS) test was used. The calculated p-value of the HS test is 0.17 and does not warrant rejection of the null hypothesis, indicating that the model is a good fit to the data.
Next, the odds ratios were calculated to determine how driving with the assumed parameter values would affect eco-friendly driving (specifically, emissions of CO
2 lower than 200 g/km). The sign next to the calculated parameter values indicates whether the analyzed factor is a stimulant or a destimulant of the odds of the phenomenon under study (
Table 5).
The calculated odds ratios indicate that during vehicle acceleration, an increase in driving velocity of 1 km/h results in a 23% increase in the probability that CO2 emissions will be below 200 g/km. In the case of relative engine load, a 1% increase leads to a 9% decrease in the probability that emissions will not exceed 200 g/km. Furthermore, when the relative throttle position changes by 1%, the probability decreases by 39%.
The developed CO
2 emissions model can also be used to assess whether, under the given engine operating parameters, emissions will remain below 200 g/km. The quality of the predictions is evaluated by plotting ROC curves. As cross-validation was used to validate the model, ROC curves were plotted separately for the training and testing sample sets, and AUC areas were calculated (
Figure 5 and 
Figure 6). Additionally, a classification matrix of actual observations and those predicted by the model was created (
Table 6).
The calculated AUC values for the respective datasets are 
 for the training set and 
 for the test set, indicating excellent discrimination (as shown in 
Table 2). Additionally, the values are very close to each other (with a difference of less than 
). Moreover, for the proposed model, 4665 cases were correctly classified (1473 true positives and 3192 true negatives), while 592 cases were misclassified (266 false positives and 326 false negatives). These findings demonstrate the accuracy and reliability of the developed model [
48].
  5.2. Logistic Regression Model for Urban Driving
The next step involved developing a logistic regression model for CO
2 emissions during urban driving cycles. As with the previous case, the process began by conducting a univariate analysis of the influence of selected quantitative variables on the dependent variable. The LR test results are presented in 
Table 7.
The calculated p-value in the test for parameter significance was 0.00 for all variables. This indicates that all factors should be included in the model’s construction.
In the next step, a multivariate model was developed. The estimated parameter values for the model, along with the results of the Wald tests for their significance, are presented in 
Table 8.
For the variable engine speed, the obtained p-value is 0.06, indicating that it is not statistically significant. However, for the accepted significance level of α = 0.05, the remaining variables were found to be statistically significant (p = 0.00).
The model equation takes the following form (11) and (12):
        where
        
 represents vehicle velocity,  represents relative engine load,  represents engine power,  represents torque, and  represents throttle position.
In the next step, the goodness of fit was examined using the HS test. The calculated p-value is 0.12, which provides no basis for rejecting the null hypothesis, indicating that the model fits the data well.
Subsequently, odds ratios were calculated to determine the likelihood that driving with the assumed parameter values would lead to environmentally friendly driving (
Table 9).
The calculated odds ratios indicate that, in urban driving cycles, an increase in driving velocity by 1 km/h increases the likelihood that CO2 emissions will be below 200 g/km by 16%. An increase in engine torque by 1 Nm raises this probability by 3%. On the other hand, a 1% increase in engine load reduces the probability of keeping emissions under 200 g/km by 15%, a 1 kW increase in engine power reduces it by 18%, and a 1% change in relative throttle position reduces it by 19%.
To assess the model’s predictive quality, ROC curves were again plotted (
Figure 7 and 
Figure 8), and a classification matrix of actual and model-predicted observations was generated (
Table 10).
The calculated AUC values for the respective datasets are 0.963 for the training set and 0.964 for the test set, indicating excellent discrimination (as shown in 
Table 2). The calculated AUC values are close to each other (the difference between them does not exceed 0.05), which allows the model to be considered correct. Moreover, for the proposed model, 6419 cases were correctly classified (3894 true positives and 2525 true negatives), while 529 cases were misclassified (360 false positives and 169 false negatives). The cited information proves the correctness of the model.
  5.3. Logistic Regression Model for the Highway Driving
In the final step, a logistic regression model for CO
2 emissions was developed for the case of the vehicle moving on the highway. As with the previous case, the process began by conducting a univariate analysis of the influence of selected quantitative variables on the dependent variable. The results of the test are presented in 
Table 11.
The calculated value of  in the test for parameter significance was 0.00 for all variables, and therefore all should be taken into account when building the model.
In the next step, a multivariate model was developed. The estimated values of the model parameters, along with the results of the Wald tests for their significance, are shown in 
Table 12.
For the variable engine speed in RPM, the obtained p-value is 0.08, indicating that it is not statistically significant. However, for the accepted significance level of α = 0.05, the remaining variables were found to be statistically significant (p = 0.00).
The model equation takes the following form (13) and (14):
        where
        
 represents vehicle velocity,  represents relative engine load,  represents engine power,  represents torque, and  represents throttle position.
In the next step, the goodness of fit of the model was tested based on a comparison of predicted and observed values. For this purpose, the Hosmer-Lemeshow (HS) test was used. The calculated p-value of the HS test is 0.07 and does not warrant rejection of the null hypothesis, indicating that the model is a good fit to the data.
Subsequently, odds ratios were calculated to determine the likelihood that driving with the assumed parameter values would lead to environmentally friendly driving (
Table 13).
The calculated odds ratios indicate that during highway driving, an increase in driving velocity of 1 km/h results in a 15% increase in the probability that CO2 emissions will be below 200 g/km. An increase in engine torque by 1 Nm raises this probability by 7%. Additionally, when the relative throttle position changes by 1%, the probability increases by 17%. Conversely, an increase in relative engine load by 1% decreases the probability that emissions will not exceed 200 g/km by 28%, and an increase in engine power by 1 kW decreases this probability by 27%.
Subsequently, ROC curves were again plotted to assess the quality of predictions (
Figure 9 and 
Figure 10), and a classification matrix of actual observations and those predicted by the model was created (
Table 14).
The calculated AUC values for the respective datasets are 0.967 for the training set and 0.966 for the test set, indicating excellent discrimination (as shown in 
Table 2). In addition, the calculated AUC values are close to each other (the difference between them does not exceed 0.05), which allows the model to be considered correct. Moreover, for the proposed model, 9830 cases were correctly classified (6024 true positives and 3805 true negatives), while 706 cases were misclassified (430 false positives and 276 false negatives). These findings demonstrate the accuracy and reliability of the developed model.
  6. Discussion
In summary, a comparison of the logistic regression models for each driving mode was conducted (
Table 15). In all models, the statistically significant variables were the vehicle velocity, relative engine load, and relative throttle position. Additionally, for the models assessing urban and highway driving, the significant variables included engine power and torque.
In the analyzed case, the impact of selected variables on the probability of eco-friendly driving was examined, specifically for cases where CO2 emissions are less than 200 g/km. For each model, an increase in driving velocity of 1 km/h increases the likelihood of eco-friendly driving by 23%, 16%, and 15% during vehicle start-up, in urban mode, and on the highway, respectively. A similar situation occurs with engine torque, where an increase of 1 Nm raises this probability by 3% and 7% for urban driving and highway driving, respectively.
It is important to note that with increases in the values of the relative load and engine power variables, the probability of driving with emissions below 200 g/km decreases. Specifically, an increase in relative engine load by 1% results in a decrease in probability by 9%, 15%, and 28% (for vehicle start-up, urban driving, and highway driving, respectively). Furthermore, an increase in engine power by 1 kW leads to a decrease of 18% and 27% (for urban and highway driving, respectively).
It should be emphasized that, for the variable representing the relative throttle position, an increase of 1% reduces the probability of eco-friendly driving by 39% and 19% during vehicle start-up and in urban mode, while in highway driving, it increases the probability by 17%.
Finally, the values of indicators characterizing the goodness of fit of the constructed models were calculated, and the results are summarized in 
Table 16.
The logistic regression model for vehicle start-up (
Table 16) shows the lowest sensitivity, approximately 82%, among all considered models, along with an accuracy level of around 89%. However, a precision of about 85% indicates that the model has a moderately effective capability in eliminating false alarms (FN), which is extremely beneficial from the perspective of monitoring CO
2 emissions [
48].
In the case of urban driving (
Table 16), the values of the discussed parameters reach significantly higher levels, with sensitivity at around 96%, precision at about 92%, and accuracy at 93%. These results indicate a high effectiveness of the developed model in urban environments, which may stem from more diverse driving conditions [
48].
On the other hand, the logistic regression model related to highway driving (
Table 16) achieves the highest values for goodness-of-fit parameters among the created models, with sensitivity at around 97% and precision at about 93%, falling slightly short only in terms of accuracy, which is approximately 92%, compared to the urban driving model. The high efficiency of this model is attributed to the more stable driving conditions on highways than in urban driving, which greatly facilitates the classification process.
The best accuracy, i.e., the percentage of correctly classified observations, is characteristic of the urban driving model (about 93%), while the lowest accuracy is observed in the vehicle start-up model (about 89%).
  7. Conclusions
The logistic regression model can be used to create models that classify levels of CO2 emissions in vehicles with internal combustion engines based on real operational data. Integrating the discussed CO2 classification models with the available vehicle information systems can lead to increased environmental awareness among drivers and promote more eco-friendly driving styles.
These classification models, along with built-in onboard systems, could provide drivers with real-time recommendations regarding CO2 emissions based on the current driving mode. For example, they could suggest minimizing abrupt velocity changes that lead to increased emissions (during vehicle start-up), optimizing velocity to maintain smooth driving (in urban driving), and maintaining a constant velocity (on highways). Such solutions would enable the average driver not only to exert a specific impact on environmental protection but also to reduce the operational costs of their vehicle.
The created logistic regression models exhibit varying effectiveness depending on the driving mode, achieving the best results—i.e., the highest values of sensitivity, precision, and accuracy—during urban driving and highway driving, and the lowest during the vehicle start-up cycle. In the context of vehicle start-up, the unstable operating conditions of the engine significantly hinder the classification of emission levels, resulting in the aforementioned lower efficiency of the models in this cycle.
During vehicle start-up, the most significant factors affecting the probability of eco-friendly driving (i.e., CO2 emissions below 200 g/km) are vehicle velocity, relative engine load, and the relative position of the throttle. In urban and highway driving, the list of parameters also needs to include engine power and torque. Furthermore, the statistical analysis revealed that engine speed in RPM does not significantly impact CO2 emissions at the assumed significance level of α = 0.05.
The problem encountered in this work is the limited sampling frequency of some data (resulting from sensor limitations, e.g., vehicle speed), which may result in the inability to capture important characteristics, especially in the case of a short-term start-up phase of the driving cycle. Directions for further research may include the application of other machine learning methods, e.g., decision trees, support vector machines (SVM), random forests, or neural networks in the considered context in search of algorithms providing the highest values of classification goodness of fit indicators.