Next Article in Journal
Therapeutic Potential of Water Chestnut Fruit Extract (Trapa bicornis) against Ovariectomy-Induced Climacteric Symptoms in Mice
Previous Article in Journal
YOLO-RRL: A Lightweight Algorithm for PCB Surface Defect Detection
Previous Article in Special Issue
Machine Learning-Based Feature Extraction and Classification of EMG Signals for Intuitive Prosthetic Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting Financial Performance in the IT Industry with Machine Learning: ROA and ROE Analysis

1
Accounting and Tax Department, Korkuteli Vocational School, Akdeniz University, Antalya 07800, Türkiye
2
Department of Management Information Systems, Faculty of Social and Human Sciences, Akdeniz University, Antalya 07800, Türkiye
3
Department of Business Administration, Faculty of Economics and Administrative Sciences, Akdeniz University, Antalya 07058, Türkiye
4
Department of Finance and Banking, Faculty of Applied Sciences, Akdeniz University, Antalya 07058, Türkiye
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2024, 14(17), 7459; https://doi.org/10.3390/app14177459 (registering DOI)
Submission received: 19 July 2024 / Revised: 21 August 2024 / Accepted: 21 August 2024 / Published: 23 August 2024
(This article belongs to the Special Issue Machine Learning and Soft Computing: Current Trends and Applications)

Abstract

:
IT is recognized as the engine of the digital world. The fact that this technology has multiple sub-sectors makes it the driving force of the economy. With these characteristics, the sector is becoming the center of attention of investors. Considering that investors prioritize profitability, it becomes a top priority for managers to make accurate and reliable profitability forecasts. The aim of this study is to estimate the profitability of IT sector firms traded in Borsa Istanbul using machine learning methods. In this study, the financial data of 13 technology firms listed in the Borsa Istanbul Technology index and operating between March 2000 and December 2023 were used. Return on assets (ROA) and return on equity (ROE) were estimated using machine learning methods such as neural networks, multiple linear regression and decision tree regression. The results obtained reveal that the performance of artificial neural networks (ANN) and multiple linear regression (MLR) are particularly effective.

1. Introduction

In the new world order, where digitalization is at the center, it is known that the only way to ensure sustainable growth is through technology breakthroughs. Some of the world’s leading economists describe the current period in the economy as the “IT (Information Communication Technologies) Techno-Economic Paradigm”. In this period, which is still in its early stages, it is expected that the activities of the IT sector, such as hardware, software, communication and telecommunication services, and content production of information technology services, are both the driving force behind the economy and can radically change activities in other sectors of the economy. The global IT market size will grow by 13% to USD 4.3 trillion in 2023, while the information technology market size will increase by 25.7% and the communication technology market size will increase by 1.4%, as reported by TÜBİSAD. In 2026, this increase is projected to reach USD 5.6 trillion.
The IT sector has consistently been an attractive sector for investors since the mid-1990s. However, after the bursting of the dot-com bubble in 2000, it is seen that the profitability of these companies is prioritized in the investment feasibility of these companies, not the idea or product that the company comes up with. In the same way, the managers of these companies prioritize the realized profitability and aim to increase the profitability rates of the companies in order to support investors’ decisions.
In this process, ROA (return on assets) and ROE (return on equity) are the ratios that managers especially work on. ROA shows how much profit a company makes for the assets it owns, and it shows how efficiently it uses its assets. ROE, on the other hand, expresses how effectively the investments made by shareholders in the company are used to generate profit. In this respect, it is known to be one of the most important profitability indicators for investors. The predictability of ROA and ROE plays an important role in the growth and sustainability of companies. Accurate estimation of these indicators, on the other hand, will form the basis for strategic decisions to be taken by managers.
In profitability forecasting, machine learning methods are known to provide more successful results in different sectors than traditional methods such as regression. The reason for this is that these methods generally produce effective solutions in cases of nonlinearity and multicollinearity in financial data sets.
The fact that the IT sector is considered as the locomotive of the sustainable world makes the decisions to be taken by managers much more important. The fact that managers use past period data in their decisions makes future forecasts much more important. The motivation of this study is to increase the number of investors by predicting profitability in the IT sector, which is the key to the digital world, and to determine the sustainability of businesses. For this purpose, profitability studies conducted with machine learning methods in different sectors will first be examined in the literature section. Then, the model designed in the methodology section will be tested using machine learning methods, and the results will be discussed.
This study differs from other studies in that it examines the IT sector, which has not been addressed as a sector in the literature before. The fact that companies in the IT sector need to follow the pace of technology and make extra efforts in innovation compared to companies in other sectors requires these companies to have a strong financial structure and the potential to generate profit. Therefore, the study will contribute to the literature in terms of creating the ideal model with ideal machine learning methods.

2. Literature

It is known that ROA and ROE are among the indicators used to measure, evaluate and analyze the performance of enterprises. When the literature is examined, it is seen that these ratios provide important information about the profitability, performance and financial structure of the enterprise. ROA is used to measure how efficiently a company uses its assets, while ROE provides information about the efficiency of the use of the company’s equity. Accurate and reliable analysis of these ratios is critical for both internal and external users of information.
When the studies on the importance of ROA in the literature are examined, it is seen that these studies are especially on determining how effectively the assets of the enterprise are used. For example, Penman (2013) and Damodaran (2007) emphasize in their study that a high ROA indicates that the enterprise earns high profits by using its assets efficiently, and this shows the operational efficiency and management success of the enterprise. In this context, it is known that ROA is effective in all operational processes of the enterprise, from production processes to marketing activities [1,2].
ROE is considered in the literature as a critical tool for assessing companies’ return on equity and shareholder returns. In their study, Brigham and Ehrhardt (2017) emphasize that a high ROE indicates that the firm generates high returns by using its equity effectively, which is attractive to investors. At the same time, ROE is also used to assess the effectiveness of a firm’s financial strategies and capital structure. There are many studies in the literature that ROE also reflects the sustainable growth rate and competitiveness of the business [3].
The relationship between ROA and ROE and the evaluation of these ratios together are frequently emphasized in the literature. There are many studies on this subject. For example, Higgins (2012) and Ross, and Westerfield and Jaffe (2016) state that analyzing these two ratios together allows for a more holistic understanding of business performance. ROA focuses on the efficiency of assets, while ROE focuses on return on equity. Analyzing the two ratios together provides information about a company’s debt utilization and financial risk management. For this reason, ROA and ROE have been included in the literature as key financial indicators that should be considered together for a comprehensive evaluation of business performance. Table 1 presents studies that include ROA and ROE [4,5].
In the literature, it is known that machine learning methods give successful results in profitability forecasting. However, it has been determined that ROA and ROE analysis has not been conducted in the IT sector, which is the key to the future. For this reason, this study is expected to contribute to the literature.

3. Materials and Methods

In this study, three machine learning methods were used to predict ROA and ROE values. Artificial neural networks, multiple linear regression and decision tree regression methods were used for forecasting. The choice of methods such as neural networks, multivariate linear regression, and decision tree regression in ROA and ROE forecasting offers various advantages depending on the modeling needs. Neural networks provide superiority in modeling complex relationships; multivariate linear regression offers simple and interpretable relationships; and decision trees are effective in understanding heterogeneous and non-linear data structures. In selecting each method, factors such as the nature of the data set, prediction accuracy requirements, and model understandability were taken into account. The model consists of 14 independent variables and 1 dependent variable.
Neural networks are primarily chosen for their ability to model complex and non-linear relationships. These methods have the capacity to learn complex relationships between data and can provide a high degree of abstraction and feature learning due to their multilayer structure. Neural networks can exhibit strong performance, especially in large data sets and with many variables. In the prediction of financial indicators such as ROA and ROE, neural networks can model the non-linear structure of the relationships of these indicators with economic and financial factors. Moreover, the performance of the model can be improved regularly, and more accurate predictions can be made by hyperparameter optimization.
Multivariate linear regression is a basic method for modeling linear relationships between a dependent variable and multiple independent variables. This approach may be suitable for modeling linear relationships that are frequently observed in financial data. In the estimation of financial ratios such as ROA and ROE, this method is used to analyze linear relationships between independent variables. The advantages of multivariate linear regression include the understandability of the model and the interpretability of the results. In addition, the model is relatively simple to set up and implement and does not require much data.
Decision tree regression is used to model the relationships between different groups and structures in a data set and makes predictions by dividing the data into various decision points. This method can be particularly effective when there are significant thresholds and non-linear relationships between data points. Financial indicators such as ROA and ROE may show different effects under different economic and financial conditions, so decision trees can be useful for analyzing such heterogeneous data structures. Decision trees clearly visualize the interactions of variables and important features within the data, which facilitates the interpretation of the model. Furthermore, this method can be more robust to noise and missing values in the data set.

3.1. Data Set

The data in the study belonged to 13 technology firms included in the Borsa Istanbul Technology index and operating between March 2000 and December 2023. Table 2 shows the companies whose data were used in the study.
The data of these companies were obtained from the Finnet financial analysis program and edited by the researchers. The data range of the study covers the period between March 2000 and December 2023. Quarterly financial period data of the companies were used as data frequency. In this study, there were 27 fiscal periods for each company. A total of 810 data points for each company and 10,530 data points in total were analyzed. Table 3 presents the variables of the study.
In the business finance literature, financial ratios consist of five main groups. These are liquidity ratios, financial structure ratios, profitability ratios, operating ratios and stock market ratios. With these groups of ratios, the situation of the company in general and the sector in particular is taken into consideration, and the model is created through dependent variables. Vic Anand et al. (2019), Thin-Hsuan, Chen and Rong-Cih Chang (2021), Stewart, J et al. (2023), Phan Vu Hang and Le Tung Duong (2024), and Xinqi Dung et al. (2024) created models in the same way to predict ROA and ROE in their studies. While designing the model of the study, the main variables related to ROA and ROE, which are both used in corporate finance and have an important place in the IT sector, were selected as independent variables. Especially for the IT sector, operating ratios were emphasized. This is because it is considered important to manage the operational processes of the sector. The designed model was applied to each company separately using artificial neural networks (ANN), multiple linear regression (MLR) and decision tree regression (DTR) methods [6,8,11,13,17].
A preliminary analysis was performed to assess whether the data were balanced and the distribution of the data. For this, it was checked whether there were missing values in the data. Statistical methods were used to determine whether there were outliers in the data. The data set was checked for any inconsistencies, incorrect entries, or data format errors. The central tendency of the distribution was determined by calculating the mean and median values of each variable. Standard deviation and variance were calculated to examine the spread of the data set. This shows how spread out variables are. This preliminary analysis provided a basic framework for understanding whether the data were balanced and the distribution of the data. The findings show that the data do not show a distribution that would have a negative impact on the modeling process.

3.2. Artificial Neural Networks

Artificial neural networks have emerged as a result of the mathematical modeling of the learning process inspired by the human brain. Artificial nerve cells are similar to biological nerve cells. Artificial neurons also form artificial neural networks by establishing connections between them. Just like biological neurons, artificial neurons have sections where they receive input signals, collect and process these signals, and transmit outputs [19].
An artificial neural cell consists of five layers. It consists of input, weights, summation junction, activation function and output layers, as shown in Figure 1.
Inputs to the neural cell can come from other cells or directly from the outside world. The information arriving at the neural cell is multiplied by the weight of the connections they come from before reaching the nucleus through the inputs. In this way, the effect of the inputs on the output to be produced can be adjusted. The values of these weights can be positive, negative or zero. Inputs with a weight of zero have no effect on the output. The summation function is a function that calculates the net input of a neural cell by adding the inputs multiplied by the weights. For each neuron, the total function is calculated by multiplying the values of the neurons connected to that neuron by the weight.
y = i = 1 n w i x i + b
Here, y represents the output, x represents the input, w represents the weight, and b represents the bias.
The activation function processes the net input to the cell and determines the output that the cell will produce in response to this input. The value coming out of the activation function is the output value of the cell. This value can either be given to the outside world as the output of the artificial neural network, or it can be reused within the network. The most used activation function today is the “Sigmoid function” shown in Equation (2) [20,21].
φ y = 1 1 + e y
Artificial neural networks are divided into feed-forward and feedback networks according to the way the neurons are connected to each other. In feed-forward networks, neurons are in regular layers from input to output. There is only a connection from one layer to the following layers. The information coming into the artificial neural network passes through the input layer and then through the intermediate layers and the output layer, respectively, and then outputs to the outside world. In feedback neural networks, unlike feed-forward neural networks, the output of a cell is not only given as input to the layer of the next cell. It can also be connected as input to any cell in the previous layer or in its own layer [22,23].

3.3. Multiple Linear Regression

Linear regression is a data analysis technique that predicts the value of unknown data using another relevant and known data value. It mathematically models the unknown or dependent variable and the known or independent variable as a linear equation [24]. The method used to explain the cause–effect relationships between two or more independent variables affecting a variable with a model and to determine the effect levels of these independent variables is called multiple regression analysis [25]. In this respect, it is similar to simple regression. There is more than one prediction variable in the multiple regression equation. Least squares (LS) estimators can be used to estimate the model parameters [26].
The most general multiple regression equation, where Xi are the independent variables and Y is the dependent variable, is as follows
Y i = β 0 + β 1 X i 1 + β 2 X i 2 + β 3 X i 3 + + β p X i p + ε i , i = 1 , 2 , 3 , n
In the equation, βj (j = 1, 2, 3, …, εj) represent the regression coefficients, εi represent the ith random error values, Yi represent the ith dependent variable, and Xij (j = 1, 2, 3, …, p) represent the independent variable. The epsilon represents error values, Y is the dependent variable, and XJ are the independent variables.
In multiple linear regression analysis, the contribution of some of the independent variables to the model may be insufficient. For this reason, it is necessary to determine the independent variables that will explain the dependent variable in the “most appropriate” way and to remove unimportant variables from the model [27]. This process is called “variable selection”. In multiple linear regression, each independent variable has a different degree of influence on the dependent variable. Therefore, in addition to the equation in simple linear regression, the coefficient of each variable does not have to be the same [28].

3.4. Decision Trees Regression

A decision tree is a flowchart-like diagram that maps all potential solutions to a given problem. Decision trees are in the form of a tree structure that can be built on both regression and classification models [29,30]. Regression is used on numerical target data, while classification is used on categorical data. Tree-based methods are simple and convenient in terms of interpretation.
A decision tree has a predefined target variable. In terms of their structure, they offer a top-down strategy [31]. A decision tree is a structure used to partition a dataset containing a large number of records into smaller sets by applying a set of decision rules, in other words, by applying simple decision-making steps to divide large amounts of records into very small groups of records [32].
The decision tree structure consists of a root node, an internal node and a leaf node. The root node is the top node of the tree. It makes decisions by dividing the dataset between two or more subsets. Interior nodes are the nodes below the root node and above the leaf nodes. They perform a feature test and divide the data into subsets. Leaf nodes are the nodes below the internal nodes. These nodes contain a class label or numeric value, which is the value by which the data point is classified or predicted [33].
One of the most important problems in decision trees is the criteria for partitioning or branching from any root. Examples of algorithms that use entropy-based partitioning are ID3 and its advanced form, C4.5. As for classification and regression trees, Twoing and Gini algorithms can be used [34].
Entropy is a measure of uncertainty. If a dataset has a single label, it has a lower entropy value. Therefore, the data should be divided in a way that minimizes entropy. If the splitting process is good, the predictions are successful.
H = p x log p ( x )
Information gained is used to determine the best split. Information gain:
G a i n S , D = H S V D V S H ( V )
S is the original data set, and D is a partition of the set. Each V is a subset of S. The information gain is defined as the difference between the entropy of the original dataset before the split and the entropy value of each attribute.
Pruning is the process of removing parts from the decision tree that do not contribute to the classification. This makes the decision tree both simple and understandable. Preliminary pruning is performed when the tree is constructed. If the values of the partitioned attributes are not above a certain threshold (error tolerance), the tree partitioning process is stopped at that point, and the dominant class label in the currently available set is created as a leaf. Subsequent pruning takes place after the tree has been created. This can be performed by deleting subtrees to create leaves, raising subtrees, and cutting branches [35].

4. Results and Discussion

In this study, a series of data preprocessing steps were performed on the data set. The missing data rate was calculated for each variable. If the missing data rate is above 5%, this may affect the data analysis and modeling processes. Missing values can be filled with mean, median or mode. For continuous data, the mean or median is usually used, and for categorical data, the mode is used. For this study, the mean method was used. Since outliers may have a negative impact on the modeling process, these values were removed from the data set. The data were converted to the range [0, 1] with the min–max normalization method. Correlation between features was analyzed for feature selection. The dataset was adapted to the modeling process, and the overall performance of the model was improved.
The forward-elimination method was used for data selection in the study. The forward-elimination method starts from a situation where there are no features in the model and adds the best-performing feature to the model at each step. This process continues until a certain stopping criterion is reached. These criteria are usually no significant improvement in the performance of the model, a certain number of features, or some other specified target.
Normalization, partitioning and analysis were performed, respectively. In the first stage, the data were normalized according to the specified method. Min–max normalization is a technique frequently used in data preprocessing. This method transforms data expressed in different ranges into a range between 0 and 1, making them comparable. The smallest (minimum) and largest (maximum) values are found for each feature in the data set. For each data point, the minimum value is subtracted from the value for that feature and divided by the difference between the maximum value and the minimum value. The min–max method normalizes the data linearly. While the minimum is the lowest value a data point can take, the maximum is the highest value a data point can take [36].
x i = x i x m i n x m a x x m i n
In this equation, xi denotes the normalized data; xi denotes the input value; xmin denotes the smallest number in the input set; xmax denotes the largest number in the input set.
In this study, R2 and RMSE statistical metrics were used to analyze the success and error values of the model.
R2 is a measure of the model’s explainability. R2 is a statistical measure of how close the data are to the fitted regression line and works by measuring the amount of variance in the predictions explained by the dataset. It is also a measure of the difference between the observations in the data set and the predictions generated by the model [37]. R2 takes a value between 0 and 1. R2 value of 1 means that the model is perfect, while a value of 0 means that the model performs poorly on a data set that it has not encountered before [38]. The closer R2 is to 1, the better trained the model is.
R 2 = 1 U n e x p l a i n e d   V a r i a t i o n T o t a l   V a r i a t i o n
RMSE measures the average magnitude of the error. RMSE is found by taking the root of the square of the difference between the predicted value and the true value for all values in the data set. RMSE is heavily influenced by outliers and is often used to reconstruct the error distribution [39].
R M S E = j = 1 n e j 2 n
where e represents the amount of error, and n represents the amount of data.
It is crucial to use a variety of data sets and techniques to verify the applicability and robustness of the model. Training–test separation, cross-validation, external data sets, and comparison with standard data sets are some of these methods. In this study, the data set was divided into two as training and testing. This method allows you to evaluate the performance of the model against data that it has not seen before.
In the training phase, the most appropriate parameters for the success of the model are determined, and in the testing phase, the success of the model is measured according to statistical metrics [40]. Different ratios were tested in this study, and it was decided that the most appropriate ratio was 70% training and 30% testing.
In this study, a feedback neural network model was used for the ANN. Thus, the error was minimized. It was observed that the model with four hidden layers and three neurons in each layer gave the most successful results. The learning coefficient (λ) is a factor that determines the speed of updating the weights in artificial neural networks. While a high value allows the weights to be updated quickly, very large values can lead to instability during the training of the network. The momentum coefficient (α) is a coefficient that increases the effect of previous updates in updating the weights. This helps to prevent the network from becoming stuck at local minima and can speed up the learning process. The learning coefficient (λ) was chosen to be 0.6, and the momentum coefficient (α) was 0.8. The number of iterations was set as 1000.
In multiple linear regression, there is more than one independent variable in the system. For this reason, the effects of variables on the system may be different from each other. In this study, the forward-elimination method was used for the selection of variables. The model initially contained a single variable, and the number of variables increased with each step.
In the decision tree method, which was another method used in this study, the Gini index technique was used as a quality measure. The Gini index is calculated for each class, and the sum of the squares of the results is subtracted from 1 [41]. The Gini value takes a result between 0 and 1, and the closer the result is to 0, the better the discrimination. If the number of records in the model is less than or equal to 2, the tree is designed to grow no further.
The results of the models using machine learning methods for ROA are presented in Table 4.
The numerical results are presented in Figure 2 (R2 values).
In general, the ANN model provides a good fit with high R2 values for most companies. The highest R2 value of ANN is 0.973 for LINK, and the lowest R2 value is 0.654 for INDES. The MLR model also performs quite well and generally has high R2 values. The highest R2 value of MLR is 0.947 for PKART, and the lowest R2 value is 0.662 for INGRM. The DTR model shows a wider range of R2 values and more variability in its performance. The highest R2 value of DTR is 0.983 for LINK, while the lowest R2 value is 0.189 for INDES. In conclusion, the ANN model performs consistently well for most companies, with good generalization and predictive power. The MLR model also performs strongly, outperforming ANN in some cases (e.g., KAREL and PKART). The DTR model shows the highest R2 values for some companies (e.g., LINK) and the lowest performance for others (e.g., INDES). This shows that the DTR model is more sensitive to the data range.
Figure 3 shows the RMSE values of companies’ ROA values according to three different machine learning methods.
In ANN, MLR and DTR, RMSE is the square root of the mean squared difference between the predicted values of a model and the actual values. In this context, lower RMSE values indicate better performance. The ANN model shows low RMSE values in general and exhibits a good prediction performance. The lowest RMSE value of the ANN is for NETAS (0.03), and the highest RMSE value is for ALCTL (0.159). The MLR model also performs well with low RMSE values in general. The lowest RMSE value of MLR is for ARENA (0.082), and the highest RMSE value is for DESPC (0.196). The DTR model, on the other hand, generally shows higher RMSE values, indicating that the model’s forecasting performance is more variable. The lowest RMSE value of DTR is for FONET (0.292), and the highest RMSE value is for LINK (0.49). In general, the ANN and MLR models provide more consistent and reliable predictions with lower RMSE values, while the DTR model exhibits a more variable performance, showing higher RMSE values in some cases. These results suggest that ANN and MLR models generally offer better prediction performance, while the DTR model is more sensitive to the data structure.
The results of the models using machine learning methods for ROE are presented in Table 5.
The numerical results are presented in Figure 4 (R2 values).
The R2 values of the ANN model are generally high, indicating that it explains the variance of ROE better. For example, the R2 value of the ANN for LOGO shows the highest value with 0.995. On the other hand, the R2 value of the ANN for NETAS shows the lowest performance with 0.411. The multiple linear regression (MLR) model also performs well with generally high R2 values. It has particularly high R2 values for DGATE (0.967) and PKART (0.977). However, the MLR model performs more poorly, with low R2 values for KAREL (0.376) and NETAS (0.252). The decision tree regression (DTR) model shows quite high R2 values in some cases (e.g., 0.998 for DESPC and 0.991 for KRONT), while in other cases, it performs poorly (e.g., 0.654 for ARENA and 0.451 for NETAS). In general, ANN and MLR models generally explain the variance of the dependent variable well and provide reliable predictions with high R2 values. The DTR model, on the other hand, may exhibit high performance in some cases, but it is more sensitive to the data structure and shows more variability in its performance. These results suggest that ANN and MLR models are more consistent and reliable, while the DTR model may outperform them in certain cases. DTR is effective in capturing non-linear relationships in the data set. If the relationship between the dependent variable and independent variables is non-linear, DTR models can better capture this complexity and perform better. For example, the high R2 values of DTR models for DESPC and KRONT (0.998 and 0.991) may indicate the presence of such non-linear relationships.
Figure 5 shows the RMSE values of companies’ ROE values according to three different machine learning methods.
The results obtained reveal that the overall performance of the methods is effective. When the ROE RMSE results are analyzed, the MLR method performed the best by reaching the lowest RMSE values in most firms. For example, in ARENA (0.099), FONET (0.107) and LOGO (0.091), the MLR method had significantly lower error rates than other methods. This superior performance of the MLR method shows that it works effectively especially in data sets where linear relationships are dominant. On the other hand, the ANN method outperforms MLR in some firms by providing lower RMSE values compared to MLR. For example, the ANN method achieved the lowest error rates in KAREL (0.071) and KRONT (0.07). The success of ANN in these firms reveals that it is more effective in data sets with complex and non-linear relationships. However, in some cases, the ANN method may have higher error rates, especially due to the complexity of the data sets. The DTR method, on the other hand, generally yielded the highest RMSE values and underperformed the ANN and MLR methods in most firms. The DTR method caused high error rates, especially in cases of noise and extreme complexity in the data sets. This may be due to the fact that the decision trees of the DTR method are prone to overfitting. In conclusion, the MLR method generally performed the best with the lowest RMSE values and stands out as the most appropriate forecasting method for many firms. The ANN method, on the other hand, outperformed others for some firms and is particularly effective in data sets with nonlinear relationships. The DTR method, on the other hand, showed the lowest performance with higher error rates in general. These results reveal that the performance of forecasting methods varies considerably depending on firm characteristics and the structure of the data sets.

5. Conclusions

The IT sector is a horizontal sector that affects all industries in an economy and plays a facilitating and productivity enhancing role. In a world moving from an industrial society where capital is important to an information society where information is important, productivity and innovation are the key drivers of growth. IT plays a role in paving the way for the development of existing and emerging technologies and business areas through innovations, and increasing productivity and competitiveness. Within the scope of this definition, the sector has recently become one of the most active sectors for investors and policy makers. This interest stems from the fact that the sector is at the heart of innovation, competitiveness and economic growth. Given the rapidly evolving dynamics of the IT sector and the profitability-oriented strategies of investors, it is clear that accurate and reliable forecasts are critical for managers.
This study aims to predict ROA and ROE with machine learning methods using the financial data of 13 technology firms listed in the Borsa Istanbul Technology index and operating between March 2000 and December 2023. The results obtained using artificial neural networks, multiple linear regression and decision tree regression methods show that these methods are effective on profitability forecasting. The ROA forecasting results reveal that ANN and MLR methods perform better, especially ANN with overall higher R2 and lower error metrics (MAE, MSE, RMSE). These findings emphasize the effectiveness of machine learning methods in predicting the financial performance of companies. The obtained ROE prediction results show that ANN and DTR methods perform better overall, with higher R2 and lower error metrics (MAE, MSE, RMSE). These findings emphasize the effectiveness of machine learning methods in predicting the financial performance of companies.
In conclusion, profitability forecasting with machine learning methods in the IT sector has revealed important findings from both academic and practical perspectives and opened new doors for future research. The use of such methodologies in analyzing the financial performance of technology firms can be considered an important step in understanding the changes and innovations in the sector. This study can be a valuable resource for financial analysts and investors, contributing to more informed future investment decisions in the sector. In future studies, in-depth research can be conducted in the area of profitability forecasting in the IT sector. In particular, the following topics can be focused on. Comparison of Different Machine Learning Methods: Different machine learning methods such as artificial neural networks, decision tree regression and multiple linear regression can be compared and their effectiveness on profitability forecasting. It is important to determine which method is more successful in which situations. Analyzing Different Financial Indicators: In addition to ROA and ROE, the impact of other financial indicators (e.g., net profit margin, return on equity) on profitability forecasting can be examined. How the use of multiple indicators affects forecasting accuracy can be investigated. Expanding Existing Data Sets: Evaluating different time periods or a larger sample of firms beyond the data set used in the study may increase the generalizability of forecasting models. Comparison of Different Sectors: Comparative studies can be conducted on forecasting the financial performance of firms in different industries other than the IT sector. In this way, the impact of sectoral differences and special conditions on forecasting models can be understood.

Author Contributions

Conceptualization, B.T., M.K., M.T., H.T. and F.Y.; Methodology, M.K., M.T., G.F.Ü.U. and F.Y.; Formal analysis, M.K.; Resources, B.T. and G.F.Ü.U.; Data curation, H.T.; Writing—original draft, B.T., M.K., M.T. and G.F.Ü.U.; Writing—review & editing, H.T. and F.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Penman, S.H. Financial Statement Analysis and Security Valuation, 5th ed.; McGraw-Hill Education: New York, NY, USA, 2013. [Google Scholar]
  2. Damodaran, A. Corporate Finance: Theory and Practice, 2nd ed.; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
  3. Brigham, E.F.; Ehrhardt, M.C. Financial Management: Theory & Practice, 15th ed.; Cengage Learning: Boston, MA, USA, 2017. [Google Scholar]
  4. Higgins, R.C. Analysis for Financial Management, 10th ed.; McGraw-Hill Education: New York, NY, USA, 2012. [Google Scholar]
  5. Ross, S.A.; Westerfield, R.W.; Jaffe, J. Corporate Finance, 11th ed.; McGraw-Hill Education: New York, NY, USA, 2016. [Google Scholar]
  6. Pham, V.H.S.; Le, T.D. Research on Applying Machine Learning Models to Predict and Assess Return on Assets (Roa). Asian J. Civ. Eng. 2024. [Google Scholar] [CrossRef]
  7. De Lucia, C.; Pazienza, P.; Bartlett, M. Does Good Esg Lead to Better Financial Performances by Firms? Machine Learning and Logistic Regression Models of Public Enterprises in Europe. Sustainability 2020, 12, 5317. [Google Scholar] [CrossRef]
  8. Chen, T.-H.; Chang, R.-C. Using machine learning to evaluate the influence of FinTech patents: The case of Taiwan’s financial industry. J. Comput. Appl. Math. 2021, 390, 113215. [Google Scholar] [CrossRef]
  9. Rai, P.; Mohapatra, B.B.; Meitei, A.J.; Jain, V. Major Determinants of Bank Profitability in India: A Machine Learning Approach. Glob. Bus. Rev. 2023, 09721509231184763. [Google Scholar] [CrossRef]
  10. Chakri, P.; Pratap, S.; Gouda, S.K. An Exploratory Data Analysis Approach for Analyzing Financial Accounting Data Using Machine Learning. Decis. Anal. J. 2023, 7, 100212. [Google Scholar] [CrossRef]
  11. Jones, S.; Moser, W.J.; Wieland, M.M. Machine Learning and The Prediction of Changes in Profitability. Contemp. Account. Res. 2023, 40, 2643–2672. [Google Scholar] [CrossRef]
  12. Belesis, N.D.; Papanastasopoulos, G.A.; Vasilatos, A.M. Predicting the Profitability of Directional Changes Using Machine Learning: Evidence from European Countries. J. Risk Financ. Manag. 2023, 16, 520. [Google Scholar] [CrossRef]
  13. Dong, X.; Dang, B.; Zang, H.; Li, S.; Ma, D. The Prediction Trend of Enterprise Financial Risk Based on Machine Learning Arima Model. J. Theory Pract. Eng. Sci. 2024, 4, 65–71. [Google Scholar]
  14. Zhang, C.; Zhang, H.; Liu, D. A Contrastive Study of Machine Learning on Energy Firm Value Prediction. IEEE Access 2019, 8, 11635–11643. [Google Scholar] [CrossRef]
  15. Zahariev, A.; Angelov, P.; Zarkova, S. Estimation of Bank Profitability Using Vector Error Correction Model and Support Vector Regression. Econ. Altern. 2022, 28, 157–170. [Google Scholar]
  16. Kristof, T.; Virag, M. What Drives Financial Competitiveness of Industrial Sectors in Visegrad Four Countries? Evidence by Use of Machine Learning Techniques. J. Compet. 2022, 14, 117–136. [Google Scholar]
  17. Anand, V.; Brunner, R.; Ikegwu, K.; Sougiannis, T. Predicting Profitability Using Machine Learning. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3466478 (accessed on 22 August 2024).
  18. Nur-Al-Ahad, M.; Syeda, N.; Vagavi, P. Nexus Between Corporate Governance and Firm Performance in Malaysia: Supervised Machine Learning Approach. Financ. Mark. Inst. Risks 2019, 3, 115–130. [Google Scholar] [CrossRef]
  19. Kayakuş, M.; Terzioglu, M.; Yetiz, F. Forecasting housing prices in Turkey by machine learning methods. Aestimum 2022, 80, 33–44. [Google Scholar] [CrossRef]
  20. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [Google Scholar] [CrossRef]
  21. Ahamed, K.I.; Akthar, S. A study on neural network architectures. Comput. Eng. Intell. Syst. 2016, 7, 1–7. [Google Scholar]
  22. Iqbal, A.; Aftab, S. A feed-forward and pattern recognition ANN model for network intrusion detection. Int. J. Comput. Netw. Inf. Secur. 2019, 14, 19. [Google Scholar] [CrossRef]
  23. Wang, W.Q.; Li, M.J.; Guo, J.Q.; Tao, W.Q. A feedforward-feedback control strategy based on artificial neural network for solar receivers. Appl. Therm. Eng. 2023, 224, 120069. [Google Scholar] [CrossRef]
  24. Hoeting, J.; Raftery, A.E.; Madigan, D. A method for simultaneous variable selection and outlier identification in linear regression. Comput. Stat. Data Anal. 1996, 22, 251–270. [Google Scholar] [CrossRef]
  25. Uyanık, G.K.; Güler, N. A study on multiple linear regression analysis. Procedia-Soc. Behav. Sci. 2013, 106, 234–240. [Google Scholar] [CrossRef]
  26. Kantar, Y.M. Generalized least squares and weighted least squares estimation methods for distributional parameters. REVSTAT-Stat. J. 2015, 13, 263–282. [Google Scholar]
  27. Ratner, B. Variable selection methods in regression: Ignorable problem, outing notable solution. J. Target. Meas. Anal. Mark. 2010, 18, 65–75. [Google Scholar] [CrossRef]
  28. Nimon, K.F.; Oswald, F.L. Understanding the results of multiple linear regression: Beyond standardized regression coefficients. Organ. Res. Methods 2013, 16, 650–674. [Google Scholar] [CrossRef]
  29. Naidu, M.S.; Geethanjali, N. Classification of defects in software using decision tree algorithm. Int. J. Eng. Sci. Technol. 2013, 5, 1332. [Google Scholar]
  30. Supsermpol, P.; Huynh, V.N.; Thajchayapong, S.; Chiadamrong, N. Geçiş döneminde Tayland’daki halka açık şirketlerin finansal performansının tahmini: Lojistik regresyon ve rastgele orman algoritması kullanan sınıf tabanlı bir yaklaşım. Açık İnovasyon Derg. Teknol. Pazar Ve Karmaşıklık 2023, 9, 100130. [Google Scholar]
  31. Malerba, D.; Esposito, F.; Ceci, M.; Appice, A. Top-down induction of model trees with regression and splitting nodes. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 612–625. [Google Scholar] [CrossRef]
  32. Song, Y.Y.; Ying, L.U. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130. [Google Scholar]
  33. Zhao, Y.; Zhang, Y. Comparison of decision tree methods for finding active objects. Adv. Space Res. 2008, 41, 1955–1959. [Google Scholar] [CrossRef]
  34. Gajowniczek, K.; Ząbkowski, T. ImbTreeEntropy and ImbTreeAUC: Novel R packages for decision tree learning on the imbalanced datasets. Electronics 2021, 10, 657. [Google Scholar] [CrossRef]
  35. Kayakuş, M.; Açıkgöz, F.Y. Classification of news texts by categories using machine learning methods. Alphanumeric J. 2022, 10, 155–166. [Google Scholar] [CrossRef]
  36. Yağmur, A.; Kayakuş, M.; Terzioglu, M. House price prediction modeling using machine learning techniques: A comparative study. Aestimum 2022, 81, 39–51. [Google Scholar] [CrossRef]
  37. Park, J.; Lee, W.H.; Kim, K.T.; Park, C.Y.; Lee, S.; Heo, T.Y. Interpretation of ensemble learning to predict water quality using explainable artificial intelligence. Sci. Total Environ. 2022, 832, 155070. [Google Scholar] [CrossRef] [PubMed]
  38. Carlson, R.E.; Foley, T.A. The parameter R2 in multiquadric interpolation. Comput. Math. Appl. 1991, 21, 29–42. [Google Scholar] [CrossRef]
  39. Hodson, T.O. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. Discuss. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
  40. Wei, Q.; Dunbrack, R.L., Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 2013, 8, e67863. [Google Scholar] [CrossRef]
  41. Dagum, C. The generation and distribution of income, the Lorenz curve and the Gini ratio. Econ. Appliquée 1980, 33, 327–367. [Google Scholar] [CrossRef]
Figure 1. ANN structure.
Figure 1. ANN structure.
Applsci 14 07459 g001
Figure 2. R2 Values of models for ROA.
Figure 2. R2 Values of models for ROA.
Applsci 14 07459 g002
Figure 3. RMSE values of models for ROA.
Figure 3. RMSE values of models for ROA.
Applsci 14 07459 g003
Figure 4. R2 Values of models for ROE.
Figure 4. R2 Values of models for ROE.
Applsci 14 07459 g004
Figure 5. RMSE values of models for ROE.
Figure 5. RMSE values of models for ROE.
Applsci 14 07459 g005
Table 1. Studies on ROA and ROE in the literature.
Table 1. Studies on ROA and ROE in the literature.
Author/YearArticle TitleVariablesMachine Learning Technique
Pham Vu Hong Son, Le Tung Duong, 2024 [6]Research on applying machine learning models to predict and assess return on assets (ROA)Days sales outstanding, Days inventory outstanding, Days payable outstanding, Cash conversion cycle, Networking capital, Financial leverage, Current ratio, Growth rate, Return on equity, Earnings before interest taxes, Gross marginLasso, Ridge, K Neighbor Regressor, SVR, Random Forest, GBR, XGBoost, MLP Regression
Caterina De Lucia, Pasquale Pazienza, and Mark Bartlett, 2020 [7]Does Good ESG Lead to Better Financial Performances by Firms? Machine Learning and Logistic Regression Models of Public Enterprises in EuropeEmission score, CO2 equivalent emissions, Resource use score, ESG score, Environmental innovation score, Salary gap, Number of employees in the CSR reporting, Number of women employees, Workforce score, Management score, Operating income, Net income, Change in total equityRandom forest, Support Vector regression, K-Nearest Neighbor Model, Artificial Neural Network Model, Ridge Regression
Ting-Hsuan Chen, Rong-Cih Chang, 2021 [8]Using machine learning to evaluate the influence of FinTech patents: The case of Taiwan’s financial industry(Interest income − Interest expense)/Total assets, Net income after tax/Operating income, Log(Total assets), Qualified capital/Risk weighted assets, Patents Logistic Regression, Multilayer Perceptron, Decision Tree, Random Forest, Naive Bayes Classifier, Bayesian Network, Support Vector Machine
Pratibha Rai, Bibhuti Bhusan Mohapatra, Vanita Jain, 2023 [9]Major Determinants of Bank Profitability in India: A Machine Learning ApproachFinancial crisis, Unemployment, Inflation, GDP growth rate, Real effective exchange rateThe random forest (RF) classification algorithm is executed using the CARET package in R. The results obtained from feature selection are corroborated the RF classification findings.
Potta Chakri, Saurabh Pratap, Lakshay, Sanjeeb Kumar Gouda 2023 [10]An exploratory data analysis approach for analyzing financial accounting data using machine learningTotal Revenue Linear Regression, KNN (K Nearest Neighbor), SVR (Support Vector Regression), and Decision Tree
Stewart Jones, William J. Moser, Matthew M. Wieland 2023 [11]Machine learning and the prediction of changes in profitabilityOn Average Rnoa, Net Operating AssetsTreeNet, OLS Regression, PZ Regression
Nicholas D. Belesis, Georgios A. Papanastasopoulos and Antonios M. Vasilatos 2023 [12]Predicting the Profitability of Directional Changes Using Machine Learning: Evidence from European CountriesROA/ROE/RNOA/FCF/CFOLinear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Decision Trees (DT), Random Forest (RF)
Xinqi Dong, Bo Dang, Hengyi Zang, Shaojie Li, Danqing M 2024 [13]The Prediction Trend of Enterprise Financial Risk based on Machine Learning ARIMA ModelProfit, Revenue, Expenditure, Assets, LiabilitiesSupport Vector Machine, Logistic Regression, Neural Network, ARIMA regression model
Chuqing Zhang, Han Zhang, AND Dunnan Liu 2019 [14]A Contrastive Study of Machine Learning on Energy Firm Value PredictionDeal value Input: EBIT, ROE, ROA, CAPEX, M&A Type, Asset Turnover, Cash Debt Ratio, Total Debt to Assets, Firm Type, Nationality, Acquisition Year, ShareDecision Tree Regression, Supported Vector Regression, Artificial Neural Network
Andrey Zahariev, Petko Angelov, Silvia Zarkova (2022) [15]Estimation of Bank Profitability Using Vector Error Correction Model and Support Vector RegressionROE and ROASupport Vector Regression and Vector Error Correction Model
Kristof Tamas and Miklos Virag (2022) [16]What drives financial competitiveness of industrial sectors in Visegrad Four countries? Evidence by use of machine learning techniques.ROA and ROEThe K-Nearest Neighbor (KNN) and Random Forest (RF) Methods
Vic Anand, Robert Brunner, Kelechi Ikegwu, Theodore Sougiannis (2019) [17]Predicting Profitability Using Machine LearningROA (return on assets), ROE (return of equity), CFO (cash flow from operations), RNOA (return on net operating assets) and FCF (free cash flow)Random Forests
Md. Nur-Al-Ahad, Syeda Nusrat, Vagavi Prakash (2019) [18]Nexus Between Corporate Governance and Firm Performance in Malaysia: Supervised Machine Learning ApproachROA and ROESupervised Machine Learning Approach
Table 2. Technology firms in the Borsa Istanbul Technology Index.
Table 2. Technology firms in the Borsa Istanbul Technology Index.
CompaniesCodes
Alcatel Lucent Teletas Telecommunications Co. (Boulogne-Billancourt, France)ALCTL
Arena Computer Industry and Trade Co. (İstanbul, Turkey)ARENA
Despec Computer Marketing and Trading Co. (Dubai, United Arab Emirates)DESPC
Datagate Computer Materials Trading Co. (Nicosia, Cyprus)DGATE
Fonet Information Technologies Co. (Ankara, Turkey)FONET
Indeks Computer Systems Engineering Industry and Trade Co. (İstanbul, Turkey)INDES
Ingram Micro Computer Systems Co. (Irvine, CA, USA)INGRM
Karel Electronics Industry and Trade Co. (Istanbul, Turkey)KAREL
Kron Technology Co. (Jersey City, NJ, USA)KRONT
Link Computer Systems Software and Hardware Industry and Trade Co. (Istanbul, Turkey)LINK
Logo Software Industry and Trade Co. (Gebze, Turkey)LOGO
Netas Telecommunications Co. (Istanbul, Turkey)NETAS
Plastikkart Smart Card Communication Systems Industry and Trade Co. (Istanbul, Turkey)PKART
Table 3. Dependent and independent variables used in the study.
Table 3. Dependent and independent variables used in the study.
Dependent Variables
Return of Assets (ROA)
Return of Equities (ROE)
Independent Variables
Receivables TurnoverOperational Ratios
Inventory Turnover
Fixed Asset Turnover
Fixed Assets/Total Assets (%)
Current Assets/Total Assets (%)
Tangible Fixed Assets Turnover
Price Earnings Ratio Stock Exchange Ratios
Earnings per Share
Cost of Sales/Net Sales (%)Profitability Ratios
EBITDA Margin (%)
Net Profit Growth (%)
Net Profit Margin (%)
Leverage Ratio (%)Financial Structure Ratio
Current Ratio Liquidity Ratio
Table 4. Estimation results for ROA.
Table 4. Estimation results for ROA.
ANNMLRDTR
R2MAEMSERMSER2MAEMSERMSER2MAEMSERMSE
ALCTL0.8360.1240.0250.1590.7780.1140.0180.1360.6830.2530.0970.312
ARENA0.7070.0950.0140.1190.880.0560.0070.0820.9580.2580.1190.345
DESPC0.8780.0770.010.1010.7740.1690.0380.1960.6260.2520.1260.355
DGATE0.790.1030.0220.1490.7530.070.0090.0940.9170.2650.0940.306
FONET0.9690.0630.0070.0820.9190.0780.0130.1120.8160.1960.0850.292
INDES0.6540.0720.0080.0880.840.0590.0050.0730.1890.0230.0780.28
INGRM0.7450.1150.0220.1470.6620.1170.0350.1860.540.3160.1180.343
KAREL0.8690.110.0150.1230.9240.0910.0130.1160.8140.3030.1570.397
KRONT0.7230.1030.0150.1240.8570.0930.0190.1360.4870.2180.0770.278
LINK0.9730.0610.0070.0860.9370.0630.0070.0850.9830.3360.240.49
LOGO0.9610.0810.0090.0950.8860.0940.0140.1180.6570.3650.1650.407
NETAS0.9130.0190.0010.030.7730.0440.0030.0560.4840.20.0960.311
PKART0.8060.0820.0190.1370.9470.090.0120.1110.7250.2410.130.361
Table 5. Estimation results for ROE.
Table 5. Estimation results for ROE.
ANNMLRDTR
R2MAEMSERMSER2MAEMSERMSER2MAEMSERMSE
ALCTL0.8470.2920.1430.1790.5030.1780.0450.2120.9130.170.050.224
ARENA0.9450.0860.0150.1230.890.0840.010.0990.6540.2980.1520.39
DESPC0.950.0940.0140.1180.9080.1080.0180.1330.9980.2690.1320.363
DGATE0.7140.1550.0370.1930.9670.0450.0030.2550.9060.3170.1630.404
FONET0.8750.0970.0120.1080.8790.0960.0120.1070.7160.280.1450.381
INDES0.6740.1510.0380.1960.5610.1540.0490.220.7530.2860.1490.386
INGRM0.4370.1770.0430.2070.8370.2730.1130.3360.9160.2910.1180.343
KAREL0.9380.0580.0050.0710.3760.1220.0210.1440.5370.2670.090.301
KRONT0.9250.0590.0050.070.6510.1120.0270.1640.9910.250.1260.356
LINK0.9120.0960.0220.1270.9470.1120.0170.1320.8970.2180.1040.323
LOGO0.9950.0760.0080.090.9930.0840.0080.0910.7970.2060.0580.241
NETAS0.4110.1090.0450.1530.2520.1310.0380.1960.4510.1110.0470.216
PKART0.9840.0510.0040.0650.9770.0490.0050.0670.6130.1560.0560.236
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tutcu, B.; Kayakuş, M.; Terzioğlu, M.; Ünal Uyar, G.F.; Talaş, H.; Yetiz, F. Predicting Financial Performance in the IT Industry with Machine Learning: ROA and ROE Analysis. Appl. Sci. 2024, 14, 7459. https://doi.org/10.3390/app14177459

AMA Style

Tutcu B, Kayakuş M, Terzioğlu M, Ünal Uyar GF, Talaş H, Yetiz F. Predicting Financial Performance in the IT Industry with Machine Learning: ROA and ROE Analysis. Applied Sciences. 2024; 14(17):7459. https://doi.org/10.3390/app14177459

Chicago/Turabian Style

Tutcu, Burçin, Mehmet Kayakuş, Mustafa Terzioğlu, Güler Ferhan Ünal Uyar, Hasan Talaş, and Filiz Yetiz. 2024. "Predicting Financial Performance in the IT Industry with Machine Learning: ROA and ROE Analysis" Applied Sciences 14, no. 17: 7459. https://doi.org/10.3390/app14177459

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop