Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior

Flores-Martin, Daniel; Laso, Sergio; Herrera, Juan Luis

doi:10.3390/electronics13244897

Open AccessEditor’s ChoiceArticle

Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior

by

Daniel Flores-Martin

^1,†

,

Sergio Laso

^2,*,†

and

Juan Luis Herrera

^2,†

¹

COMPUTAEX, Extremadura Supercomputing Center, 10004 Cáceres, Spain

²

Department of Computer Science and Communications Engineering, Universidad de Extremadura, 10004 Cáceres, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(24), 4897; https://doi.org/10.3390/electronics13244897

Submission received: 29 October 2024 / Revised: 3 December 2024 / Accepted: 8 December 2024 / Published: 12 December 2024

(This article belongs to the Special Issue Ubiquitous Computing and Mobile Computing)

Download

Browse Figures

Versions Notes

Abstract

:

Smartphones have become a central element in modern society with their widespread adoption driven by technological advancements and their ability to facilitate everyday tasks. A critical feature influencing user satisfaction and smartphone adoption is battery life, as the intensive use of mobile devices can significantly drain battery power. This paper addresses the challenge of predicting smartphone battery consumption using artificial intelligence techniques, specifically deep learning, to optimize energy efficiency. By collecting and analyzing data from mobile devices, such as application usage, screen time, network type, network usage, and battery temperature among others, we developed a predictive model tailored to user-specific behavior. This model identifies the key variables affecting battery consumption and provides personalized energy-saving strategies. Our approach offers a solution for improving battery performance, contributing to more efficient energy management in both hardware and networking terms while adapting to individual usage patterns. The results demonstrate that our approach can significantly predict the battery to anticipate power demands based on user-specific usage. While challenges remain, such as improving the generalizability of the model across different devices, this approach provides a scalable and adaptive method to improve the energy efficiency of smartphones, which will allow efficient management solutions to be suggested, contributing to better battery and network management to improve user experience and device longevity.

Keywords:

smartphone; battery; artificial intelligence; consumption; user experience

1. Introduction

Smartphones have become an immensely important mobile device in today’s society due to their multiple functions and their ability to facilitate daily life. This has led to widespread adoption and a significant market that generates substantial revenue [1]. In the market, the smartphone is one of the most demanded and competitive products [2]. Its constant evolution, driven by technological advances such as artificial intelligence (AI), high-resolution cameras, or 5G connectivity, has led to a continuous renewal of devices, fostering innovation and competition among manufacturers [3]. Additionally, its relevance is reflected in the global economy, where smartphone sales represent a significant portion of revenue for tech companies. Smartphones have also transformed entire sectors, such as e-commerce and digital advertising, by facilitating access to products and services anytime and anywhere. In summary, the smartphone is a central element both in our daily lives and in the market with an impact that continues to grow as technology advances.

Among the main features of these devices, the battery plays a crucial role for users, as intensive use can deplete the battery quickly [4,5]. Battery life is one of the main factors influencing user satisfaction with mobile devices [6] and is considered one of the most important factors affecting smartphone purchase decisions [5]. Developers are also highly concerned about the impact of their applications on battery life [7]. Excessive battery consumption in an application can lead to poor reviews in app stores. Consequently, these devices increasingly feature better charging speeds or even higher capacities. However, due to intrinsic characteristics such as design and efficiency, battery consumption remains a challenging task in this field.

Battery consumption prediction in mobile devices is a crucial area due to the importance of battery life for both users and developers [8]. AI plays a fundamental role in enhancing the energy efficiency of these devices by enabling precise energy consumption predictions, which facilitates resource optimization and prolongs battery life [9]. AI applications in this area include the use of predictive models based on machine learning, such as neural networks and regression models, which analyze complex consumption patterns based on various parameters, such as CPU usage intensity, screen brightness, and network usage [10]. Additionally, AI enables resource optimization through techniques like Dynamic Voltage and Frequency Scaling (DVFS) and predicting application behavior to anticipate their energy demands. Historical usage-based models are also employed to predict battery life, offering notifications and suggestions to users on how to extend it [11]. AI is also useful in simulating and evaluating energy consumption, allowing developers to simulate different workload scenarios and assess their real-time impact. Nevertheless, challenges remain, such as the need to develop models that are accurate and generalize well across different devices and conditions as well as balancing the energy consumption of the AI models themselves with the benefits they provide [12]. Furthermore, there is a need for models to be more personalized to adapt to the specific usage patterns of each user.

This work addresses this challenge by leveraging AI techniques, and more specifically deep learning (DL), to predict battery consumption and optimize smartphone energy use. Through the collection and analysis of data from mobile devices, this work aims to build a predictive model that identifies the key variables impacting battery life. Firstly, different mobile device variables have been analyzed and collected, such as application usage time, screen on time, Internet network type, and battery temperature, among others, and how these affect battery levels. To collect and analyze these variables, a mobile application designed for this purpose has been developed, as existing tools like Battery Historian [13] are not suitable due to their complexity and lack of accessibility for non-technical users. Additionally, commercial applications do not provide all the required data or the ability to export it for further analysis, necessitating the development of a dedicated solution to meet the specific requirements of this study. This allows us to identify which variables most affect battery life depending on device usage. Secondly, the collected data have been cleaned, and DL techniques have been applied to generate a user-specific model to predict battery levels based on the identified characteristics and variables. The main novelty of this work lies in the generation of a fully customized model for each user, integrating contextual data from sensors and usage patterns in different scenarios to personalize predictions. By learning from individual user behavior, the proposed approach can predict how long the battery will last based on applications in use, screen brightness, and environmental conditions. Moreover, the customized user models leverage standard DL techniques, making it simple to implement and reproduce the proposal. Also, this approach could be used even to predict other aspects such as CPU, RAM, or network usage if required. This approach allows for the creation of more efficient battery management strategies, offering personalized energy-saving solutions based on user behavior. Through the evaluation and results of these models on different devices and users, this work aims to provide a scalable solution that improves smartphone battery performance while adapting to each individual’s unique usage patterns.

The rest of the document is organized as follows. Section 2 presents the motivations and background of the work. In Section 3, the design and implementation of the tests are detailed. The results are shown in Section 4. Then, Section 5 discusses the results and the limitations that have been detected. Section 6 analyzes related work. Finally, conclusions and future work are drawn in Section 7.

2. Background and Motivations

Smartphone usage has experienced explosive growth in recent decades, affecting virtually every aspect of daily life. As of 2023, there are estimated to be more than 6.9 billion smartphone users worldwide, representing approximately 86% of the global population [14]. In developed regions, such as the United States and western Europe, the penetration of these devices exceeds 90%. This massive increase has led to a significant increase in people’s time using their phones. On average, users spend between 3 and 5 h a day on their smartphones, although in some countries, such as the United States, this time can exceed 5 h [15,16].

In this context, AI has become a cornerstone of smartphone technology. The most common activities performed on smartphones include the use of social media applications (e.g., Instagram, Facebook, TikTok, Twitter) that rely on AI algorithms to present relevant content [17], messaging services (e.g., WhatsApp) that are becoming integrated with text generation AIs like GPT [18], streaming platforms for entertainment (e.g., YouTube, Netflix) that leverage AI to recommend content to the user [19], or gaming, where AI is leading technology to improve resolution and graphical quality [20]. Many of these AI systems, such as those leveraged by predictive keyboards or phone cameras [21], are executed directly on the smartphones themselves, contributing to the ubiquity of AI techniques.

Within AI techniques, DL has become of critical importance in recent years, because it can solve complex problems very efficiently, thus having a major impact on various industries. One of the main reasons why it is so relevant is its ability to process huge amounts of data. In a world where information is generated on an unprecedented scale, such as images, videos, and transactions, deep neural networks—which are the basis of DL—can detect complex patterns and deliver very accurate results. This is vital in areas such as e-commerce, social networking, and especially in medicine [22,23].

DL has proven to be especially powerful in detecting behavioral patterns in people, making it a key tool in several areas that rely on understanding and analyzing human behavior [24,25]. Through deep neural networks, artificial intelligence systems can identify, interpret, and predict complex behavioral patterns from large amounts of data. This is possible because DL is not only able to analyze data efficiently but also to learn from it without direct human intervention, extracting features and patterns that might otherwise go unnoticed [26]. In advertising and marketing, DL is used to analyze users’ online behavior, identifying their preferences, buying habits, and browsing patterns [27,28]. E-commerce and social media platforms, for example, use DL models to personalize ads and recommendations based on users’ past behavior. These systems can predict what type of content, products, or services a person is most likely to want or need, resulting in a more personalized and efficient experience for both consumers and businesses.

Therefore, DL smartphone data processing can be key to improving battery usage by understanding user behavior patterns. Smartphones collect information about app usage, connection habits, screen brightness, and other activities that directly impact power consumption. By analyzing these data, DL can predict when and how users use their devices, allowing it to automatically adjust settings such as screen brightness, background app management, or network connection, thus optimizing power consumption. Among the main benefits, DL can help maximize battery savings based on user habits, turning off unnecessary functions when not in use, or adjusting device performance according to actual needs. In addition, it can learn to predict when the user will need longer battery life, adapting power usage to extend device autonomy at key times.

DL offers a solution because it can analyze behavioral patterns in a personalized way, automatically adjusting settings such as network usage, screen, or closing unnecessary apps in the background, thus optimizing energy consumption based on each person’s lifestyle. This is crucial as, with the increasing use of smartphones for essential tasks such as browsing, work, entertainment, and communication, the demand for longer battery life is higher than ever. Without a smart approach to battery management, mobile technology cannot live up to users’ expectations in terms of autonomy. In this sense, DL offers significant advantages for creating personalized models that predict battery consumption in smartphones by adapting to each user’s unique patterns. These models analyze data such as app usage, activated sensors, and screen-on time, learning how these factors impact device energy consumption, or human activity recognition [29]. DL’s ability to handle complex and diverse data makes it ideal for this purpose. One of its key strengths is personalization. DL models can adapt to specific user habits, such as app usage frequency, sensor activity, and charging behaviors. Additionally, they can combine these data sources to understand how interactions, like using GPS while running heavy apps, influence energy consumption. DL algorithms, especially recurrent neural networks like LSTMs, are effective for processing time series data. This enables accurate predictions of battery usage based on historical and real-time data. They also excel at dynamically optimizing device resources, offering proactive recommendations such as closing apps or adjusting settings to extend battery life. Compared to traditional methods, DL is superior at identifying complex patterns and adapting to changes in user behavior. This ensures that predictions are more accurate and practical, enhancing both the user experience and the smartphone’s energy efficiency.

To fulfill this need and ensure users’ expectations can be met, this work proposes an approach to detect smartphone usage patterns based on different mobile applications and describes how this proposal can be used to improve battery consumption.

3. Deployment and Implementation

The implementation of this work is divided into two main parts. The first part entails the development of an Android application to passively collect smartphone usage data. This application captures key metrics such as app usage, battery consumption, network traffic, screen activity, and memory usage, compiling this information into a structured format for further analysis. The second addresses the application of DL techniques to the collected data to detect patterns in user behavior and system performance. By leveraging these insights, the aim is to identify opportunities for optimizing energy consumption, ultimately improving battery life without compromising user experience.

3.1. Data Collecting

The intensive use of smartphones has generated growing concerns regarding battery consumption, which often fails to meet the daily demands of users. The proliferation of applications and services running in the background contributes to this issue, causing devices to require optimization in the use of their energy resources. However, many users are unaware of the factors that most influence battery consumption and lack effective tools to manage them efficiently.

In this context, the implementation of a mobile application that enables the passive collection of smartphone usage statistics becomes essential. This application would capture key data, such as app usage, screen activity, and real-time energy consumption, without interfering with the user experience. By processing and analyzing these data, usage patterns and areas for improvement could be identified, allowing developers to optimize both software and hardware, enhancing the energy efficiency of the devices. This solution would not only help extend battery life but also contribute to an improved user experience and technological sustainability.

Therefore, the first part of this work involves implementing an Android application that passively collects smartphone usage statistics. For the implementation of the application, a preliminary analysis of the Battery Historian framework was carried out, which is a tool that provides detailed statistics on a device’s battery consumption over time [13]. Battery Historian allows the visualization of system-level energy use events through an HTML representation generated from system logs. In the analysis of individual applications, the tool offers various data that help identify behavioral patterns that generate high energy consumption.

However, extracting these data is not trivial, as Battery Historian is not designed for direct use by end users due to its complexity and the need for advanced technical knowledge. Specifically, utilizing Battery Historian requires enabling developer mode and USB debugging on the device, connecting it to a computer via USB, and using ADB [30] (Android Debug Bridge) commands to extract the system logs. These logs must then be processed on a separate computer where the Battery Historian tool is executed. This multi-step process, which involves both technical expertise and additional hardware, makes Battery Historian unsuitable for inclusion in a solution designed for everyday users.

Moreover, commercial applications that monitor battery status, amperage, or usage [31,32,33], as well as Google’s Digital Wellbeing application [34] for tracking app usage, provide several parameters utilized in this study. However, these applications merely display values without offering options for exporting or utilizing the data for further analysis.

To overcome these barriers, an application was developed that, upon starting, automatically generates a CSV file in the device’s download folder. This file contains all the information collected about the device’s energy usage in an accessible and organized way. In the following sections, the development process of this application will be detailed step by step.

Various key metrics related to smartphone usage have been analyzed, including screen activity time, app usage, network connection, battery charge, discharge cycles, and processor behavior. To efficiently capture these usage data, different specialized libraries that allow access to these metrics in a precise and real-time manner were identified and evaluated. Among the most relevant libraries are those that provide access to operating system statistics services and APIs that monitor energy status and application performance, facilitating data collection for further analysis and battery consumption optimization.

Several key considerations must be addressed regarding the variables in this dataset. First, the variable “Top 5 most-used applications (name)” includes an “×5” because the generated CSV contains five columns representing the five mobile applications with the highest usage time. Consequently, there are also five corresponding columns for the variable “Top 5 most-used applications (time in seconds)”, which details the usage time for each application.

A similar structure applies to the variables “Top 5 highest CPU-consuming applications (name)” and “Top 5 highest CPU-consuming applications (percentage)”. These variables reflect the same applications identified in the usage time metrics, but here the focus is on the estimated CPU consumption. A critical point is that the CPU consumption is an estimate, as Android does not provide a direct API for obtaining per-application CPU usage. Therefore, CPU consumption is inferred mathematically, assuming that the most-used applications consume the most CPU resources, although this may not always be accurate.

Additionally, two more variables—age and gender—have been incorporated. These are explicitly requested from the user and are not collected through Android libraries. The rationale for including these variables stems from prior survey results, which suggested that age and gender might influence mobile device usage patterns. The purpose of including them in the dataset was to explore whether these demographic factors could impact battery life. However, these variables were eventually excluded from the final dataset, as the battery consumption model is personalized for each user, rendering age and gender irrelevant to the analysis.

It is important to note that the inclusion of these variables was based on initial hypotheses without prior knowledge of how they would affect the target variable (battery consumption). The reasoning behind their inclusion was a preliminary assumption of an indirect correlation, predicting how these demographic factors might influence battery usage. As the project progressed, further analysis was conducted to determine which variables were genuinely useful for building the predictive models.

To collect these data, an Android application was developed that works as follows:

Installation and initial execution: After being installed on the device, the application is executed for the first time and requests the necessary permissions such as access to the internet, storage, and background execution. These permissions are essential for its proper operation.
Collection of personal information: During the initial setup, the user is prompted to fill out a short form with personal information, such as age, gender, and data collection interval (10 min by default). These data might influence the device’s behavior or battery usage and help to customize the prediction model.
Background operation: Once configured, the application mainly operates in the background, collecting usage data from the device (such as applications used, usage patterns, battery charge, and discharge cycles, among others), according to the variables detailed above. Table 1 and Table 2 detail different columns for an example CSV file generated with these data.
CSV Generation and DL processing: The application generates a CSV file with the user’s information. This file is extracted from the mobile device and used to generate the customized DL model on an external server with higher computational capabilities to speed up the process. This model predicts battery consumption based on the user’s device usage habits and optimizes performance by adjusting system parameters or notifying the user when necessary to improve energy efficiency.

Figure 1 details the flowchart of the application. This application aims to improve the device’s battery life based on each user’s behavior through the use of advanced artificial intelligence techniques.

Finally, to evaluate the impact of the data collection application on battery consumption, detailed measurements were performed over a 24 h period using the Battery Historian framework. Table 3 summarizes the key metrics obtained during this evaluation. The estimated power usage of the application was minimal, representing only 0.07% of the device’s total power consumption. In addition, the application executed foreground activities only four times within the measured duration of 25 h, 1 min, and 6880 s. The CPU usage time recorded was similarly low, totaling only 24.36 s over the entire period. These results demonstrate that the application imposes negligible power overhead, ensuring that it does not significantly affect the device’s battery life during normal use.

3.2. Data Processing

Once the application, its functioning, and how the data are ultimately obtained have been explained, we will describe how to apply DL algorithms to these datasets to achieve the project’s final goal: predicting resource usage on a mobile device. To accomplish this, a sequential methodology will be followed with various procedures: data preprocessing, analysis of the obtained variables, model definition, model compilation, model training, model evaluation, and observation and conclusions from the results obtained. The explanation includes detailed code in Python that leverages the Pandas library to perform data processing. The code leveraged for the different processes is included in Appendix A.

3.2.1. Data Preprocessing

Once the data have been obtained, it is not possible to define and train a model without first examining how the data are presented, as there may be inconsistencies, blank rows, string values, etc.

First Step: Handling NaN Values

The generated CSV file must be imported using a specific encoding and delimiter to remove all rows containing NaN values, meaning non-existent values. This is because the Android application may produce slightly inconsistent values depending on the devices (manufacturers and models) while running in the background, and if it fails to collect a specific feature, that row for that particular time measurement would be deleted to avoid future problems.

Second Step: Processing String Variables

The string-type variables were processed, as there are variables where the value is a text string. These variables include the following:

Applications with the highest usage time (name);
Applications with the highest CPU consumption (name)
Network connection.
Type of network.
Charge status.
Gender.

When working with TensorFlow models, they do not accept string-type values since they handle data with mathematical operations. Therefore, all these text variables need to be converted to numerical values. To achieve this, specific mappings will be defined for the categorical (string) variables, which will be applied to the corresponding columns to convert categorical values into numerical ones. For example, in the Charging status variable, a value of “yes” will be converted to 1, and a value of “no” will be converted to 0. However, the same logic cannot be applied to the two remaining categorical variables, which are Applications with the longest usage time (name) and Applications with the highest CPU usage (name). The reason is that the values of these variables will vary from user to user, as it is very likely that they have five different applications compared to another user.

In this regard, a list was created with the columns containing the names of applications, specifically those containing “(name)” in the title. There are five columns with the same title to identify five different applications for the Applications with the longest usage time (name) variable, and there are another five columns to differentiate the same previous applications referenced in the Applications with the highest CPU usage (name) variable.

Third Step: Generating Unique Application Mappings

Finally, a unique list of all the applications present in the selected columns was created. Then, a mapping of each unique application to a numerical value was generated.

3.2.2. Variables Analysis

Once the data have been prepared and are in a ready-to-use format, the next step is to analyze which of the collected variables significantly influence the target variable, “Battery Level (%)”. Since this is the variable we aim to predict based on the input features, it is essential to identify which variables have a direct impact on the target. Additionally, it is crucial to examine the relationships between the variables to determine which are dependent and which are independent of one another.

To achieve this, various visualization techniques can be utilized to analyze the variables. In this case, visualization techniques from the Seaborn library were employed, which, together with Matplotlib, enabled the creation and better understanding of the graphs.

The first function used from Seaborn is the heatmap, which allows us to generate a heatmap where a numerical value is shown for each pair of variables. This indicates that depending on the value, whether positive or negative, the correlation has either a positive or negative effect.

This value ranges from −1.00 to 1.00, where −1.00 indicates a completely inversely proportional relationship, and 1.00 represents a directly proportional relationship. An example of the device OnePlus LE2123 is shown in Figure 2.

The first set of dependent variables identified is related to RAM memory. Since we have three variables, Total RAM (MB), Available RAM (MB), and Used RAM (MB), it is obvious that these three are interrelated—when available memory increases, used memory decreases, and vice versa. Therefore, only one of these variables, Used RAM (MB), was retained. A similar scenario occurs with storage-related variables. Here, too, we have three dependent variables, so only the variable defining used storage was kept.

Lastly, the case of network traffic is handled similarly. We have the following variables: Total Network Traffic Received (MB), Total Network Traffic Sent (MB), Mobile Data Received (MB), Mobile Data Sent (MB), Wi-Fi Data Received (MB), Wi-Fi Data Sent (MB), Device Network Traffic Received During Execution (MB), and Device Network Traffic Sent During Execution (MB).

It is clear that the total data received is the sum of Wi-Fi and mobile data received, and the same applies to data sent. As these variables are directly proportional, we decided to retain only Total Network Traffic Received (MB) and Total Network Traffic Sent (MB). This aspect is shown in Figure 3.

Thus, based on the dependencies between variables, the following were removed: Total RAM (MB), Available RAM (MB), Total Storage (GB), Available Storage (GB), Mobile Data Received (MB), Mobile Data Sent (MB), Wi-Fi Data Received (MB), Wi-Fi Data Sent (MB), Device Network Traffic Received During Execution (MB), and Device Network Traffic Sent During Execution (MB).

Influence on the Target Variable

The next step is to examine the influence of the remaining variables on the target variable, i.e., the battery level. To analyze this influence, we have performed a statistical regression on the selected variables with each of them as independent variables trying to predict battery level as a dependent variable. The results of the analysis are provided in Table 4. For each independent variable, we report its coefficient in the regression, their error, and the p-value.

To perform a selection of variables, the p-value is leveraged. The p-value of a variable represents the likelihood of finding results similar to those existing without considering it. Hence, low p-values are correlated with statistical significance, as it would be unlikely to obtain the same results if the variable is ignored. The selection is thus based on a 95% significance; i.e., we select variables with a p-value equal to or under 0.05. These values are highlighted in bold in Table 4. Moreover, we also select some variables that, although they are not statistically significant according to the multivariate analysis, are considered relevant in the context of other variables that are. More specifically, although some names and percentages of CPU consumption are considered not statistically significant with this analysis, we consider all the variables as useful. These differences exist because some of these are OS applications that generally are the highest consuming ones, and while it would be acceptable to ignore them in a general case, we understand that edge cases such as phones with malfunctioning phones or malicious applications may benefit from also considering these variables.

In summary, the following variables were removed: Available RAM (MB), Available Storage (GB), Network Connection, Network Type, Age, and Gender. These were excluded from the training phase of the model because they are deemed as not statistically significant within our dataset, and they should hence have minimal effects on the dependent variable. Adding these variables to the model would lead to worse learning results.

3.2.3. Data Tuning

Once the data are prepared and the variables for training the model are selected, the next step is to format the features correctly. This involves separating the input variables from the output variable, normalizing the input features, and adjusting the shape of the training variables before defining the model.

It is important to note that from this point onward, the differences between the two neural networks used in the project—namely, a Deep Neural Network (DNN) and a Recurrent Neural Network (RNN), a Long Short-Term Memory (LSTM) network type—will be discussed. The main reason for using both types of networks is that DNNs are ideal for modeling complex and nonlinear patterns, excelling in tasks such as image processing and classification, thanks to their scalability, learning transfer, and exploitation of large volumes of data. LSTMs are optimal for sequential and temporal data, such as machine translation, time series analysis, and speech recognition, thanks to their ability to handle long-term dependencies and processes in real time. Both architectures are complementary and useful for tackling problems that combine static and sequential data.

Data Settings for Deep Neural Network (DNN)

Before defining the DNN, we will store all our input variables in a new dataframe called “X”, removing those that are dependent and have minimal influence on the output variable. On the other hand, the target variable of the project will be stored in another variable called “y”.

To ensure the generalizability and reproducibility of the obtained results, as well as the statistical robustness of DL models, the training data are split into three sets: training, validation, and testing. The training set is the subset of the datasets in which the DNN will be trained. Nonetheless, its performance will not be evaluated directly with the results it obtains on the training set: instead, to decide the best hyperparameter settings and architectures, its results on the validation set will be leveraged. On the one hand, this prevents overfitting, as models with overfitting issues only perform well in data they have been trained on. Thus, by evaluating their performance on data they have not been allowed to train on, these models can be discarded. On the other hand, this allows for an analysis of their generalization capabilities: if a model performs equally well on data it has and has not been trained on, it can be considered to generalize well. However, if only these two subsets are leveraged, models cannot be compared against each other, as the hyperparameters for each model were already chosen using the validation set results, and they could thus be biased by the validation set. Hence, a third test set, separate from the other two, is used to perform this comparison.

To split the complete dataset into these three sets, we use the relevant functions of the sklearn library. Specifically, our code reserves 10% of the dataset for validation and 20% for the testing set. sklearn splits the data randomly to minimize biases. To ensure the reproducibility of the results, we use the random state seed 42 for the pseudo-random data shuffler. Next, the features are normalized to have a mean of 0 and a standard deviation of 1 using StandardScaler. To avoid biases, this transformation is performed on the training set and applied with the same characteristics in the rest. Specifically, the fit_transform method was used to fit the scaler using the training data as well as transform the training dataset. For the validation and testing datasets, transform was used instead, which leverages the same parameters that were fit during fit_transform to ensure consistency across datasets.

With this technique, we ensure that all features are scaled similarly. Finally, the form of the output variables (y_train, y_val, and y_test) is adjusted to make sure that they are column vectors and have the same format as the input features

Data Settings for Long Short-Term Memory (LSTM)

It is important to note that an LSTM network is a type of RNN capable of learning long-term dependencies in sequential data [35]. The process of adjusting the input and output variables follows the same sequence as described in the previous sections. However, there is one additional step required for LSTM networks.

This step is specific to LSTM models, where the data must be reshaped into a three-dimensional form: number of samples, number of time steps, and number of features. In this case, the number of timesteps is set to 1, meaning that each sample is treated as an individual time step. This reshaping is crucial because LSTMs are designed to handle sequential data and require this specific input structure. It is noteworthy that the rest of the scaling and splitting process that the dataset goes through for DNN training is also applied to the dataset when LSTM models are trained instead.

3.2.4. Model Definition

Once the variables and input and output data are in the correct format, it is necessary to define the model. This section will explain how to define the model for both the DNN and the LSTM network. It is noteworthy that the architectural design of the AI models tested, including both DNN and LSTM, as well as the values for the coefficients, its activation functions, and all the rest of the hyperparameters, were selected through empirical testing. Various values for all the parameters were tested, and those with the best validation results were finally selected for implementation and evaluation.

DNN Network Definition

For the definition of the model, the sequential model tf.keras.models. Sequential() is used to build the neural network layer by layer linearly. In this model, there will be four layers, where the number will depend on the amount of data in the dataset, data complexity or other factors. This definition is detailed in Table 5.

The input layer is composed of 17 neurons, which is the number of input parameters for the DNN. The first dense layer consists of 128 neurons, utilizes the ReLU (Rectified Linear Unit) activation function, and applies L2 regularization with a coefficient of 0.001. The ReLU function introduces nonlinearities into the model, allowing it to learn complex patterns [36]. Additionally, L2 regularization helps prevent overfitting by penalizing large weights in the network. The second dense layer has 64 neurons, and it also has ReLU activation and L2 regularization with the same coefficient. The third dense layer contains 32 neurons with ReLU activation and L2 regularization. Finally, the output layer consists of a single neuron, which is suitable for regression problems where a continuous value is predicted.

Once the model is defined, it must be compiled. For this, the Adam optimizer is used. Adam combines the advantages of other optimizers such as AdaGrad [37] and RMSProp [38], making it efficient and well suited for handling large datasets and high-dimensional networks. The learning rate is set to 0.001, which controls the adjustment of the network’s weights in each iteration. Additionally, the selected loss function is the mean squared error (MSE). This function measures the average difference between the model’s predicted values and the actual values, and it is commonly used in regression problems.

LSTM Network Definition

The difference in the definition of the LSTM network with respect to the DNN is that the LSTM network includes an LSTM layer at the beginning, as well as in the DNN network, Table 6 is included to visually understand the structure.

The first layer is an LSTM layer with 128 units. LSTMs contain memory cells and gates that control the flow of information, allowing the network to learn temporal dependencies. Similar to the DNN, the activation function used is ReLU, and L2 regularization with a coefficient of 0.001 is applied to prevent overfitting.

Following the LSTM layer, two dense layers with 64 and 32 neurons, respectively, are added, each using the ReLU activation function and L2 regularization. These layers perform nonlinear transformations of the features extracted by the LSTM, and they are identical to the corresponding layers in the DNN.

Finally, as in the DNN, the network has a dense output layer with a single neuron, which is suitable for regression problems where a continuous value is predicted. For the compilation, the process is the same as in the DNN, i.e., using the Adam optimizer and the mean squared error (MSE) as the loss function.

3.2.5. Model Training

Once the model has been defined and compiled, it is time to train the model using the data that has been loaded into the variables X_train and y_train. Following the same principles applied during data tuning, the split considers 70% of the data for training, while 10% is reserved for model validation, and the remaining 20% is the testing set, all of which have been split pseudo-randomly to ensure both statistical robustness and replicability.

To perform the training, a technique known as Early Stopping is employed. This technique is designed for training neural networks and is intended to monitor the loss on the validation set, stopping the training when no significant improvements are observed after a specified number of epochs. This helps prevent the model from overfitting while retaining the best weights obtained during the training process.

The parameters for this Early Stopping technique are as follows:

monitor=’val_loss’: Monitors the loss on the validation set (val_loss). Early Stopping will halt the training if this metric does not improve after a specified number of epochs.
patience=20: Indicates the number of epochs to wait without improvements before stopping the training.
restore_best_weights=True: Restores the model weights to the configuration where the best validation metric was obtained at the end of the training.

Once this technique is defined, it will be included in the training call, i.e., model.fit():

model.fit: Method that trains the model using the input data (X_train) and the labels (y_train).
epochs=200: Maximum number of epochs the model will train. An epoch represents a complete cycle through all the training data.
batch_size=32: Number of samples processed before updating the model weights. A batch size of 32 means that 32 samples will be processed at once before an update.
validation_split=0.2: Proportion of training data reserved as a validation set. Here, 20% of the data will be used to evaluate the model’s performance after each epoch.
callbacks=[early_stopping]: List of callbacks provided during training. In this case, Early Stopping is used to monitor the loss on the validation set and halt training if no significant improvements occur.

Finally, the training process returns an object named training_history that contains information about the loss metric at each epoch. This history will be used to evaluate how the model learns over time and to identify any issues with overfitting.

3.2.6. Model Evaluation

Once the model has been trained, the next step is to evaluate it and examine its results. To perform this evaluation, the model’s performance has been assessed using several factors:

Predictions using the function model.predict(X_test): This function is employed to utilize the trained model for making predictions on the test dataset X_test. To compare the predictions against the actual values, a loop iterates through all the predictions and their corresponding actual values (y_test), which are displayed for visual comparison. This aids in understanding how closely the model’s predictions align with the real values. For the evaluation of the LSTM model, the same dataset has been used; however, it has been preprocessed into sequential input vectors to accommodate the temporal structure required by the LSTM model. Each input vector consists of a sequence of time-ordered data points that the LSTM uses to capture temporal dependencies and relationships in the data.
Model evaluation using the function model.evaluate(X_test, y_test): This function is utilized to assess the model’s performance on the test dataset. It returns the loss of the model on the test data, which is displayed on the screen. In this case, the MSE is used as the loss metric. Additionally, the mean absolute error mean_absolute_error(y_test, predictions) and the R² score r2_score(y_test, predictions) are calculated to obtain more information about the model. The former calculates the mean absolute error (MAE), which measures the average error of the model’s predictions, while the latter computes the coefficient of determination (R²), indicating how well the model’s predictions fit the actual values. An R² value close to 1 indicates a good fit.
Calculating the accuracy percentage: A relative calculation is performed to determine the model’s accuracy where a threshold of 10% is defined. The number of predictions within 10% of the actual value is computed using the expression np.sum(np.abs(predictions - y_test) <= threshold * y_test). Finally, the percentage is obtained by dividing the number of correct predictions by the total predictions, thus calculating the percentage of predictions that fall within this threshold.
Exporting the trained network: Once the training and evaluation have been completed, the trained model will be saved to a file with a specified name. This allows for the model to be loaded and utilized later without the need for retraining, enabling it to be tested with new data to verify the effectiveness of the training.
Visualization of loss during training: Functions from the Matplotlib library will be employed to observe the training and validation loss across epochs with the resulting figures being included in Section 4. This visualization enables an understanding of how the model has learned during the training process.
Plot of predictions vs. actual values: Matplotlib will again be used to create a plot displaying the model’s predictions against the actual values, also shown in Section 4, along with a reference line indicating the position the prediction points would have if they were perfect.

It is noteworthy that all these tools and functions used to evaluate our model will also be applied to the DNN network. The following section presents the results obtained from all the aforementioned functions created and calculated to evaluate the model’s performance. In the following section, the results obtained for both types of networks running on different mobile devices will be discussed.

4. Results

The results obtained are discussed below. For this purpose, first, the mobile devices used in the experiments are described, and, second, for each of them, the results obtained for each type of neural network, both DNN and LSTM, are discussed.

4.1. Setup

To achieve more realistic results, real Android mobile devices from different manufacturers and with different characteristics have been considered. The details of the different devices with which the experiments have been performed are listed in Table 7.

The mobile application developed and detailed at the beginning of this work was installed on these devices, which were used by real users following their normal, everyday usage pattern. However, with the goal of capturing a wide range of real-world usage behaviors, the OnePlus_LE123 was used by a user who experienced more extreme scenarios, such as frequent travel and heavy use of resource-intensive apps like maps applications. In contrast, the Samsung-SM-G991B was used for a shorter period of 15 days, focusing on evaluating the model’s performance with fewer data points. This diversity of devices, coupled with varying durations of use, allows for a thorough examination of the model’s adaptability to different data volumes and usage intensities.

The test lasted one month on most devices, and data were collected every 20 min to ensure a comprehensive picture of usage patterns. The one-month duration was chosen to capture variations in user behavior over time, including differences in usage during weekdays versus weekends, as well as potential fluctuations caused by external factors such as user schedules, device settings, or environmental conditions (e.g., temperature affecting battery performance). This period also allows the observation of long-term trends, such as gradual battery degradation or changes in usage habits.

The data collection interval of 20 min was selected as a balance between granularity and efficiency. A shorter interval could have imposed excessive processing and storage demands on the devices, potentially interfering with normal usage. On the other hand, a longer interval risked missing critical details in the data, such as the impact of transient high-resource activities (e.g., app launches, navigation, or video streaming). The 20-minute interval ensures that the collected data accurately represent real-world usage patterns while minimizing any potential impact on device performance or user experience.

The results of the data capture for each device are presented in Table 8, Table 9, Table 10, Table 11 and Table 12. These tables specifically highlight the top five most-used applications for each device along with the frequency of usage (i.e., the number of times each application was opened) and the total usage time.

A detailed analysis of these tables reveals significant variability in user profiles and behaviors. For instance, the user of the OnePlus_LE1123 demonstrated intensive usage patterns, focusing heavily on travel-related applications such as Google Maps and Ryanair, combined with the frequent consumption of multimedia content. In contrast, the POCO_M2102J20SG device was primarily used for gaming, with the user dedicating a significant portion of their time to video games. The remaining devices reflect more typical day-to-day usage patterns with a balanced focus on social media platforms and multimedia consumption.

This variability underscores the diversity in user behaviors, ranging from intensive and specific application usage to more general and moderate day-to-day activities. Such differences provide valuable insights into the varying levels of usage exhaustiveness captured during the study, further highlighting the robustness and representativeness of the dataset.

The main objectives of the evaluation are to assess the precision of DNN and LSTM neural network models for battery life predictions in the devices listed above, to compare the results across DNNs and LSTMs based on different metrics, to compare DNNs and LSTMs with the baseline AI methods of linear regression and decision trees, and to analyze the value of the loss function over time, with both types of neural network and in all devices analyzed. For training and testing, the dataset collected from each device was divided with the same method as during the data tuning. Specifically, the complete dataset was split pseudo-randomly, with 70% of the used for model training, 10% for validation, and 20% for testing.

4.2. Results Analysis

Before proceeding, it is important to note that for each dataset, adjustments can be made to the neural network’s architecture. The training of a neural network is highly dependent on the nature of the data, as simpler datasets with lower volumes will require different configurations compared to more complex datasets with larger amounts of data.

4.2.1. Device 1: OnePlus_LE2113

At this point, the prediction for the OnePlus device model LE2113 will be performed, using a dataset of approximately 1000 rows. The same dataset will be applied to both the DNN and the LSTM networks. On one hand, for the DNN, we use four layers, where the first layer has 256 neurons, the second has 128, the third has 64, and the last layer has a single neuron. This configuration allows the network to learn from more complex data. The results from this neural network are detailed in the following.

The MSE value is the loss metric that measures the average of the squared errors between the predictions and the actual values. An MSE of 70.0478 indicates that on average, the squared prediction errors are approximately 70. The lower this value, the better the model’s performance. However, it is common to obtain this result initially, as the model is still learning, and it is challenging to achieve good results in regression problems. The MAE value measures the average absolute errors between the predictions and the actual values. An MAE of 5.4225 means that on average, the model’s predictions are 5.42 units away from the actual value. Similar to MSE, a lower MAE is better because it indicates that the predictions are closer to the actual values. An MAE of five can be considered acceptable.

The R² value indicates the proportion of the variance in the dependent variable (actual values) that is predictable from the independent variables (predictions). An R² of 0.8923 suggests that approximately 89.23% of the variance in the actual data are explained by the model’s predictions, and it is thus a positive result. The accuracy percentage indicates the percentage of predictions that fall within 10% of the actual value. An accuracy of 67.68% means that approximately 67.68% of the predictions are within ±10% of the actual value. This provides a sense of how often the model makes reasonably accurate predictions. Therefore, it can be observed that reasonably good characteristics have been obtained for a regression problem where it is challenging to achieve high accuracy. Moreover, it is important to note that the model has an accuracy of approximately 70% with predictions differing by no more than 5 units from the actual value.

Finally, Figure 4 illustrates the predictions of the DNN. In this figure, each dot represents a testing example that has been used to evaluate the DNN, where its position in the X-axis represents the ground truth (i.e., the real value from the dataset), and its position in the Y-axis represents the value predicted by the DNN for the example. The closer the dots are to the diagonal, represented as a red line for clarity, the better the predictions of the DNN are, as they are closer to the real values. In this case, most points are close to the red line, i.e., near their actual value. However, there are some points that deviate from the red line, which are outside the 70% accuracy range of the model, as highlighted in red earlier.

On the other hand, for the LSTM network, the same steps are followed as those used to observe the results in the DNN. The same number of layers and neurons are used, but it is important to recall that the structure and the first layer of the LSTM network differ from those of the DNN. It is observed that the results are very similar to the DNN, with an MSE value of 65.46, which is due to the fact that the model is still learning and there is more data loss at the beginning. When looking at the MAE value, it is slightly lower, but it still indicates that the values differ by no more than five units. For the R² value, a slight improvement is observed, although it remains a very good result, as a value closer to 1 indicates a better model. Finally, the accuracy percentage improves slightly, meaning that more predictions fall within the accuracy range. Some predictions vs. actual values are also displayed.

It can thus be seen that there is nearly the same number of accurate predictions within the established percentage as with the DNN. Perhaps the LSTM network has slightly more correct predictions since its percentage is higher, but there is no clear difference between the two models. As shown in Figure 5, which shows predictions and real values with the same representation as before, the only noticeable difference is that perhaps there are a few more points near the red line due to the higher accuracy percentage.

This concludes the initial results for our first user and their mobile device. In conclusion, it can be said that with both models, approximately 70% accuracy was achieved when attempting to predict the battery level of the mobile device. Considering this is a regression problem, these results are satisfactory, as achieving higher accuracy is challenging. However, testing will continue with different mobile devices from other users to see how the models respond to different data.

Next, Figure 6a shows a comparison of the DNN and LSTM models with the two baseline AI methods of linear regression and decision trees. First, the linear regressor obtains significantly worse results than the rest of the methods, with MSE orders of magnitude above those of other methods (9095.99, compared to the 70.04 obtained by the model with the second worst results), an MAE almost six times as large as those from the rest of methods, less than half of the accuracy, and an

R^{2}

value of −13, meaning that its predictions are significantly worse than assuming a fixed value. On the other hand, decision trees have a performance comparable to the DL models, close to those of the LSTM (65.69 MSE, while the LSTM obtains 65.46, 5.35 MAE with respect to the 5.28 MAE of the LSTM, 0.895 and 0.899

R^{2}

, respectively, and 68.49 precision compared to 69.7). Hence, the data have a structure too complex to be captured by linear regression, requiring more complex models to be represented. Regarding the comparison of DNN/LSTM, at the beginning of the training (first epochs), the loss in both the training and validation sets decreases rapidly, indicating that the model is learning effectively and fitting well to the data. After approximately 25–50 epochs, the loss begins to stabilize and decreases more slowly, suggesting that the model is reaching its maximum learning capacity and approaching a local minimum in the loss function. Finally, toward the end of the training (around 200 epochs), the loss seems to have stabilized, indicating that the model has reached equilibrium and is not improving significantly with additional epochs. This also suggests that the use of Early Stopping might not have been necessary up to this point, as the losses do not show an increase, which would indicate overfitting. All of this can be seen in Figure 6b.

4.2.2. Device 2: OnePlus_LE2123

Next, we will test our models with a different dataset. Coincidentally, it is from the same brand, but a different model and a different user, so the results will differ. Since we will follow the exact same steps as before, this time we will go straight to the point without providing as much detail on how to visualize the data, as it mirrors the previous process. For training, we have a dataset of around 1600 rows. Below, the different results for both neural networks are presented. On one hand, for the DNN network, we continue using the same number of layers. However, this time, the first layer has 128 neurons, the second layer has 64, the third has 32, and the last one has a single neuron. The reason for reducing the number of neurons compared to the previous user’s dataset is that a higher number of neurons led to overfitting and, hence, poor performance.

It is worth noting that this user’s dataset yielded the worst results for both models compared. This could be due to the fact that it is the largest dataset we have worked with, which implies more data and, therefore, more variety, potentially confusing the model. After thoroughly reviewing the data, it was observed that the applications in use vary greatly, which causes the model to become confused. Hence, the number of neurons was reduced to prevent the model from memorizing the data and leading to overfitting. The MSE is significantly higher than that of the first user. This is due to the greater average squared difference between the model’s predicted values and the actual values compared to the first dataset. As mentioned, this is because the data are more complex and diverse than in the first case. Therefore, since the MSE is higher, it is obvious that the MAE will also be higher, as the predictions are worse. In this case, the predictions will differ by up to eight units, which is still within the accuracy range. Consequently, the R² will also worsen due to the values of the previous variables, moving further away from 1. Finally, as expected, the relative accuracy percentage also decreases to 60%, which is approximately 10% lower than the first dataset.

Next, some predictions versus actual values are shown, using the same representation as in previous figures. As seen in Figure 7, the points are more dispersed compared to the graph of the first dataset.

On the other hand, to test the LSTM network, the same steps were followed as for the DNN network, even using the same neurons and layers. Therefore, we will focus solely on presenting the results for MSE, MAE, and R² along with the graph of predictions versus actual values. As can be observed, there is a slight positive change in all variables, but it is not significant enough to suggest that the LSTM network is superior to the DNN network. Looking at the MSE, it has decreased by approximately 13 units; however, it remains higher than the value obtained with the LSTM network for the first dataset. Likewise, as the MSE improved, so did the MAE, although the improvement is minimal (about 0.20), meaning the predictions will be nearly the same within the accuracy range. For this reason, only the graph will be shown this time, and not the individual predictions, as the DNN results can serve as a reference. The R² also improves, specifically by 0.2, which is a marginal difference, but any approach toward 1 is a positive outcome for model performance. Finally, the accuracy percentage improves by 2% compared to the DNN, meaning a few more predictions fall within the accuracy range relative to the DNN network. Figure 8, with the same dot-and-line representation used by other figures, shows a similar distribution of points relative to the red line when compared to the DNN network, as the results in the evaluation variables for both models are nearly identical.

This concludes the evaluation of both models for the second user’s data. Figure 9a,b summarize the results and the loss generated during the process. It has been observed that with a larger dataset, there is a higher likelihood of greater data variety, which can confuse the model and lead to poorer performance.

4.2.3. Device 3: Samsung_SM-A226B

The next user we will evaluate has a Samsung device with the model SM-AA226B. For this evaluation, we are working with a small dataset containing 120 rows, and we will apply both of our models to observe the results. On the one hand, to test the DNN network, we used the same characteristics as with the previous user, meaning we used four layers where the first layer contains 128 neurons, the second layer has 64 neurons, the third layer has 32 neurons, and the last one has a single neuron. The reason for choosing these layers with these neurons is that given the small dataset, we do not need numerous neurons. Additionally, since the data do not exhibit complex relationships, there is no need to increase the number of neurons, which could otherwise result in overfitting.

It can be observed that the results are very good initially, and in fact, they are the best among the four users. This is because with a small dataset, the model can more easily learn patterns. Moreover, as the data are consistent and not highly varied, it is easier for the model to interpret. We achieved an MSE of 16.31, which indicates that very little data were lost during the training process, which is a fantastic result for our network. Additionally, the “loss” is very low in the final epochs of training. Since the MSE is excellent, the MAE is also very good, specifically 2.69, meaning that in the predictions, the deviation will be at most this many units within the accuracy range. This is a very good value, as it represents predictions that are very close to the actual values. The R² value is very close to 1, indicating that approximately 91% of the variability in the data can be explained by the model, which is a very good result. Finally, the accuracy percentage is also very good, as the previous variables have shown good characteristics. It stands at 94.74%, indicating that most predictions are within the established accuracy threshold. As can be seen in Figure 10, most of the predictions are in the established range, so our model has been able to predict the battery level in most cases.

The loss function (Figure 11b) is almost identical to that of the first user, which is why it was not shown for the second user, except that in this case, there was more loss at the beginning of the training. Subsequently, the network learns very quickly, stabilizing between epochs 25 and 50, indicating it reaches its maximum learning capacity. Finally, toward the end of the epochs, the training finishes with the network already fully trained. When looking at the evolution of the predictions in Figure 10, with the same representation used in earlier devices, it can be seen that most points are close to the red line, meaning that the predictions are very close to the actual values, as we observed earlier with the prediction vs actual value numbers. However, the difference between these results and the others is that this one is not reliable. The reason is that for testing a model, a much larger dataset is needed for the model to observe data variability and learn better. In this case, it did not learn from the data but instead memorized it, as the model found it very easy to learn from a dataset with only 100 rows and very similar data.

On the other hand, to test the results with the LSTM network, we used the same characteristics as with the DNN network to see whether this network would improve or worsen the results. The same dataset and the same number of layers and neurons were used. This time, for the first time, the LSTM network yielded worse results than the DNN network. However, the results are not significantly worse, only showing slight declines in each variable. For the MSE, a higher value was obtained, indicating that the LSTM network took longer to learn, resulting in a higher loss value during training. This suggests that the LSTM network may be better suited for larger and more complex datasets. Similarly, the MAE also worsened, increasing by one unit, which will hardly be noticeable in the predictions. Nonetheless, this is still a very good value for predictive performance. Regarding R², it moved further away from 1, decreasing by 0.7 units. However, it is still a very good value, indicating that the model was able to predict most variables within the accuracy range. Finally, the accuracy percentage also decreased, specifically by 3%. Nonetheless, an accuracy percentage of approximately 90% is still very high for regression problems. Since the results are similar, the graphs and values will be almost the same. Therefore, we will only present the results and Figure 12 of the predictions versus actual values with the dot-and-line representation.

Given that the previous variables produced nearly identical results, the same visual and numerical results will be obtained, as shown in the graph, where most points are near the red line in Figure 12. As shown in Figure 11a,b, the predictions are nearly the same within the accuracy threshold with five predictions falling outside the accuracy range compared to four with the DNN network.

At this point, we conclude the evaluation of the third user. Differences were observed compared to the other users, showing that a small dataset with simple data are more appropriate for a DNN network than for an LSTM network. Moreover, both models exhibited faster learning, producing better results. However, as mentioned with the DNN network, a model that learns from such a small dataset is not reliable, as it is likely to memorize the data rather than learn it.

4.2.4. Device 4: POCO_M2102J20SG

Next, we will test the model on a dataset gathered from a POCO_M2102J20SG device. This dataset contains approximately 800 rows, and the results from the different models with these data will be evaluated. Regarding the DNN network, it is observed that for this dataset, the model ranks third in performance if we order the users based on their characteristics. It should be noted that these characteristics depend on the data used to train the model. In this case, we have a medium-sized dataset, where the data are not as simple because they change more frequently.

Therefore, the MSE value is higher than for users one and three, but it is lower than for the second user. This indicates that the model struggled to learn from these data because they are more variable than those of users 1 and 3 but simpler than those of user 2. Specifically, this model uses Early Stopping for the first time, as it learns so quickly from the data that additional epochs worsen its “loss” value, leading to overfitting. The MAE is approximately 7, meaning that the predictions will deviate by 7 units above or below the actual value. The R² is a value that tends to be more positive than negative, as it is closer to 1 than 0, making it a good value for our model. Finally, the accuracy percentage is approximately 70%, which is a fairly good percentage given the number of predictions the model can accurately make, as shown in Figure 13 through the dot-and-line representation. Nonetheless, it is observed that most predictions fall within the acceptable range. Therefore, in the predictions vs. actual values graph, most points are near the red line, meaning they are within the established threshold.

Regarding the LSTM network, it will be evaluated using the same data, layers, and number of neurons as the DNN network. Therefore, the same steps as in the DNN evaluation will be followed. It is observed that the LSTM network has slightly improved the MSE, MAE, and R² variables, although the accuracy percentage has slightly decreased. Essentially, it is as though the same characteristics have been obtained, as these slight increases and decreases do not make one model definitively better than the other. Specifically, the MSE decreases by around 10 units, but this is insignificant since the “loss” only slightly decreased. However, the smaller the value, the better it is for the model evaluation. It is worth mentioning that the Early Stopping technique was also required for the LSTM network, as the model quickly learns from the data and does not need many epochs. When the MSE improves, the MAE also improves, although compared to the DNN network, it only drops by 0.06, which is negligible. Thus, the predictions will deviate by approximately 7 units, either higher or lower. The R² improves by 0.03, and any increase that brings the value closer to 1 is an improvement. However, such a small increase does not significantly influence the decision on which model is better. Finally, the accuracy percentage drops by 3%, which is barely noticeable, and the model will still have almost the same number of predictions within and outside the threshold. As shown in Figure 14, which shows the predictions and real values in the dot-and-line representation, there is practically the same number of predictions outside the range for both the LSTM and DNN networks.

Therefore, we can conclude that both models yield almost identical results. Additionally, if we observe the graph, the positioning of the points relative to the reference line is similar to the DNN graph, meaning that both have most of their points near the red line, i.e., within the threshold. This is further illustrated in Figure 15a,b.

4.2.5. Device 5: Samsung_SM-G991B

Lastly, we will test the final user with a Samsung_SM-G991B device, using a dataset containing approximately 1000 rows. The same dataset will be applied to both the DNN and LSTM networks. To observe the results of the DNN network, unlike the last few users and similar to the first, we have once again doubled the number of neurons in each layer. That is, the first layer has 256 neurons, the second 128, the third 64, and the last contains a single neuron. It is important to recall that each model’s performance depends on the data, and in this case, the duplication of neurons yields better results. Nevertheless, we will evaluate the results following the same procedure used for all users and the same steps.

It is observed that the results are similar to those of the first user, showing strong characteristics. For the MSE, a test data loss of 76.01 is obtained, which, as previously discussed, is not very high. Initially, the network is learning, which results in data loss, but as the model progresses, the loss decreases as it learns more effectively. The R² value is also very good as it approaches 1, meaning the model can explain approximately 89% of the variability in the data. Therefore, with good performance in the previous variables, the MAE value is also strong, specifically 5.21. This means that predictions will deviate by at most 5 units provided they are within the accuracy range. Finally, the accuracy percentage is 67.65%, which, while relative, is a good percentage considering it is a regression problem. An accuracy close to 70% indicates good results. It is observed that most predictions fall within the range established by the MAE, and the predictions in red are those outside the accuracy threshold. To better understand the results, Figure 16 displays the graph of real values vs. predicted values with the same representation used for the other devices. From this illustration, it is evident that most points are close to the red line, indicating they are near the actual value, while the outlier predictions are highlighted in red, as previously mentioned.

Next, we will analyze the results provided by the LSTM network with the same dataset and characteristics. To evaluate the results of the LSTM network, we will test with the same dataset, same neurons, and layers as the DNN network. The same procedure will be followed for the LSTM evaluation. As with the other users, the results obtained are similar to those of the DNN network. Specifically, they can be considered almost identical, as the MSE increases by only 4 units compared to the DNN. This is because the LSTM network takes longer to learn, resulting in slightly more data loss, though a difference of 4 units is negligible in determining whether it is better or worse. For the R², the result is identical up to the first two decimal places, which indicates a good result close to 1. In terms of MAE, the LSTM improves by 0.11 compared to the DNN, meaning the predictions will still deviate by approximately 5 units from the actual value. Finally, the accuracy percentage increases by 2%, which is barely noticeable, making it nearly the same percentage as in the DNN network. It is observed that in a small segment of the results, there are the same number of incorrect and correct predictions as in the DNN network, as their variables are very similar, leading to nearly identical results. Therefore, as shown in Figure 17, the graph of predictions vs. actual values with the dot-and-line representation is very similar to the previous one.

As in the previous examples, a comparison of the results between the DNN and LSTM networks, as well as the obtained losses, is illustrated in Figure 18a and Figure 18b, respectively.

Finally, at this point, the evaluations for all users and this specific user have been completed. It has been observed that this user ranks third in terms of the best characteristics, which is mainly because the nature of the data has proven to be the most crucial factor in achieving good or poor results. Moreover, the results can be deemed acceptable or good in most of the users, models, neural networks and datasets leveraged in the evaluation, thus proving the feasibility of battery life prediction using the data gathered.

5. Discussion

Based on the fundamental definition of each neural network, the LSTM network should be more effective in the context of this project, as it is better suited for time-dependent time series data, while a DNN would perform better for classification tasks. In this case, it is important to note that all variables are updated periodically, which should result in better performance from the LSTM network, as they are designed for time series data [35]. However, after conducting the evaluation tests, it has been observed that this is not the case. Indeed, in three out of the four users tested with their respective data, better characteristics were achieved with the LSTM network. Yet these improvements were negligible, with the collected variables only showing slight enhancements, such as a 10-unit decrease in MSE or a 0.5 decrease in MAE, or a 3% increase in accuracy. Given these results, we cannot conclude that the appropriate network for this project would have been the LSTM. The reason LSTMs do not outperform DNNs is that although the temporal aspect is relevant to this problem, the relationship between the features and the target variable is significantly stronger. Hence, the recurrent features of LSTMs, rather than providing the model with additional, valuable information, add noise to the data. This noise then affects the LSTM’s results, worsening them and making the DNN a more capable model in most cases.

As a general rule, DNN would work better if the data collected from the user (such as app usage and device settings) are not deeply related to time. For example, if the dataset only contains static information about used apps, brightness settings, and other features that do not depend on the specific time of day or the sequence of actions, a DNN could be sufficient to predict battery consumption. However, an LSTM approach would be more suitable when there is a significant temporal dependency in the data, such as the sequence of activities throughout the day. For instance, battery consumption could be influenced by the use of certain apps at specific times (like using navigation in the morning and watching videos in the afternoon). An LSTM model can learn and predict how these interrelated actions affect battery life more accurately, as it has the ability to retain information over time and model interactions between different sequential events. These are the main reasons that led the authors of this work to evaluate and compare both approaches to carry out the hypothesis of predicting battery consumption in smartphones.

It is also noteworthy that the results collected refer to aggregated results over executions in the same dataset, and thus, there may be slight variations across executions. For example, an MAE of 5 does not necessarily mean that every prediction will be off by exactly 5 units, and the neural network can be more or less precise in specific executions. Nonetheless, aggregated metrics are used to provide a better overview of the precision to be expected.

Based on these tests, the following differences between the two networks can be highlighted:

Execution Speed: After running the models on Google Colab, it was observed that the LSTM network takes longer to execute than the DNN. This is due to the more complex structure of the LSTM, which includes temporal dependencies, while the DNN performs independent calculations in each iteration.
Learning Time: As mentioned earlier, the presence of time series data affects execution time. Therefore, an LSTM network requires more time to learn compared to a DNN due to its more complex computations.
Initial Data Size: It was observed that the only instance where better results were obtained with a DNN occurred when the dataset was smaller. This indicates that the DNN performs better with smaller datasets. In contrast, with larger datasets, as seen with the other three users, the LSTM performs better due to its capacity for memorization of data.
Data Complexity: Similarly, while the size of the data matters, its complexity is also crucial. Simple data relationships that do not vary much are better suited for a DNN, whereas complex data that fluctuate constantly are better processed by an LSTM.

In conclusion, it can be inferred that an LSTM network is more suitable for datasets where temporal dependency is involved. However, in this project, it was observed that there are no significant improvements by using one model over the other.

Moreover, mobile devices with different technical characteristics and from different manufacturers have been used to evaluate the feasibility and performance of the machine learning models. This diversity of devices has not only allowed validating of the generalization of the models in a wide range of configurations and usage scenarios but also provided valuable information to adjust their adaptability to different contexts. In addition, the approach considers different user-specific scenarios, recognizing that battery consumption can vary significantly depending on the environment and circumstances, such as during working hours, during vacation periods, or in other particular contexts. As part of our future plans, a significant extension of this work is envisaged to incorporate a more holistic and dynamic approach, allowing a user to use multiple devices, such as tablets and wearables, to contribute to the construction of a personalized model that combines and synergistically leverages information gathered from all of their devices. Another planned evolution of the work is that the models will be generated on the mobile devices themselves. Mobile devices have increasingly higher performance, and the existence of DL tools such as TensorFlow Lite allows the generation and execution of models directly on the devices. This would save the intermediate step of sending the data to an external server and would also increase the security of user data. Thus, the approach will ensure that the system is more robust, flexible, and scalable, adapting not only to the diversity of devices and user needs but also to changes in their usage patterns, optimizing resources efficiently, and facilitating deployment in different technological environments.

A limitation of this work is the scalability of the proposed solution. Although the current approach has demonstrated its effectiveness in a variety of devices and usage scenarios, further research is needed to ensure its applicability in larger-scale deployments, such as when integrating multiple users or devices into a single predictive system.

Another important aspect is ethical considerations, in particular the privacy of user data. This work inherently involves the collection of sensitive data on user behaviors and device usage. To mitigate risks, all data collection and storage processes adhere to strict privacy-preserving practices, ensuring that no personally identifiable information is recorded or shared. Future work will continue to explore privacy-preserving machine learning approaches, such as federated learning, to enhance security and ensure compliance with data protection regulations [38].

6. Related Work

Different research studies have focused on the prediction of various hardware parameters in mobile devices, such as CPU usage, memory, and power consumption. These studies focus on optimizing resource management, enabling better system efficiency that uses DL models to address these challenges [39,40]. In this case, the goal is not to optimize system resource management, as mobile devices are closed systems with limited access to internal hardware configuration and resources. In this area, there are some proposals from researchers; for example, Ding et al. [41] model user behavior on smartphones, proposing a system that identifies app usage patterns and how they affect energy consumption. They use behavioral mining to optimize the use of device components. Our work advances by applying deep neural networks that optimize battery usage by adapting in real time to the characteristics and behavioral patterns of each user.

Neto et al. [42] develop energy consumption models based on user usage patterns using deep neural networks. Their approach analyzes data collected on smartphone usage, identifying the components that most influence power consumption. While their work is relevant, our study differs by focusing on personalizing battery consumption through predictive models based on individual characteristics.

Singh et al. [11] present an extensive analysis of techniques for state-of-charge (SoC) estimation and battery life prediction. They provide a qualitative comparison of different approaches used to predict battery life, which is relevant for optimizing power usage in mobile devices. However, their approach does not include the customization of prediction models according to user behavior, as proposed in this work, which further highlights the need to investigate the path of our proposal.

Xia et al. [43] propose DeepApp, which is an app usage prediction system for smartphones based on DL. The model predicts which apps the user will use next, improving the experience and optimizing device performance. While DeepApp predicts app usage, our approach focuses on predicting battery consumption based on user usage and habits.

Finally, Çiçek et al. [44] present a ConvLSTM-based system that predicts the remaining battery capacity in the next 24 h, automatically adjusting settings such as screen brightness or disabling non-essential functions (Wi-Fi, GPS) to optimize power consumption. The main difference with our work is that while their approach focuses on automatic hardware optimizations, our research employs DL models to personalize battery predictions based on user behavior, dynamically adapting to individual usage patterns.

7. Conclusions

The importance of resource consumption in mobile devices is a crucial factor today, as both developers and consumers consider battery health and longevity to be among the most significant concerns. This work delves into identifying the key factors influencing battery consumption to optimize resource usage and improve battery life in smartphones.

This work highlights the growing relevance of resource consumption in mobile devices, particularly concerning battery life, which is a critical factor for both users and developers. The analysis of energy consumption factors revealed that data- and processing-intensive applications are the primary contributors to battery drain. The development of an application to collect user data enabled the training of DNN and LSTM models, which proved effective in predicting consumption patterns. LSTM networks, in particular, excelled in predictions based on historical data, while DNNs were useful for classifying complex patterns. These findings suggest that integrating predictive models into operating systems could optimize battery usage by dynamically adjusting resources based on user behavior. This work lays the groundwork for future research aimed at driving more efficient and adaptive solutions for energy management in mobile devices.

Regarding future research, several lines of investigation are being pursued. One involves developing mechanisms to optimize real-time energy consumption, dynamically adjusting resources based on usage predictions. Another focuses on applying this approach to other devices, such as wearables and Internet of Things devices, while also collaborating with hardware research to create more efficient components. Another aspect is that models can be trained on the devices themselves and combined with models from other users through federated learning to generate richer and more robust models.

Author Contributions

Software, D.F.-M. and S.L.; Validation, J.L.H.; Investigation, D.F.-M., S.L. and J.L.H.; Writing—original draft, D.F.-M., S.L. and J.L.H.; Funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the grant DIN2020-011586, and projects PDC2022-133465-I00, and TED2021-130913B-I00 funded by MICIU/AEI/10.13039/50100011033 and the “European Union NextGenerationEU/PRTR”, by the European Commission under the project “Extremadura EDIH T4E: Tech for Efficiency” (Project ID: 101083667) within the framework of the European Digital Innovation Hubs initiative, Call: DIGITAL-2021-EDIH-01, by the Department of Economy, Science and Digital Agenda of the Government of Extremadura (GR21133), and by the European Regional Development Fund.

Data Availability Statement

The data collected and used in this project are available for review and analysis at https://doi.org/10.5281/zenodo.13982080.

Acknowledgments

Special thanks to our colleague Samuel Trinidad Vázquez for the implementation of the mobile application and collection of user data. We also thank the COMPUTAEX Foundation for allowing us to use the computational resources of the LUSITANIA supercomputer.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Implementation Code

Appendix A.1. Data Preprocessing—First Step

Listing A1. Reading of the CSV file of a OnePlus LE2123 and removal of blank rows.

Appendix A.2. Data Preprocessing—Second Step

Listing A2. Selection of desired columns in accordance with the selected variables.

Appendix A.3. Data Preprocessing—Third Step

Listing A3. Generating unique application mappings for the selected variables.

Appendix A.4. Data Settings for DNN

Listing A4. Selecting input and output variables from the dataset.

Appendix A.5. DNN Model Definition

Listing A5. DNN Model definition and the composition of each layer.

Appendix A.6. LSTM Model Definition

Listing A6. LSTM Model definition and the composition of each layer.

Appendix A.7. Model Training

Listing A7. Model compiling detailing the epochs number and the batch size. 1

Appendix A.8. Model Prediction

Listing A8. Model prediction.

Appendix A.9. Model Evaluation

Listing A9. Model evaluation.

Appendix A.10. Model Accuracy Calculation

Listing A10. Model accuracy after training and evaluation. 1

Appendix B. Libraries Leveraged in the Mobile Application

Appendix B.1. List of Libraries

The main libraries used for collecting metrics in the application, listed in Table A1, are used as follows.

Table A1. Android libraries used in the mobile application and the measured metrics.

Library	Metric
Battery Manager	Battery
UsageStatsManager	Application usage
UsageStatsManager.queryUsageStats	Application usage
UsageStats	Application usage
ActivityManager	RAM memory
ActivityManager.MemoryInfo	RAM memory
Environment.getDataDirectory ()	Storage
StatFs	Storage
StatFs.getBlockSizeLong ()	Storage
StatFs.getBlockCountLong ()	Storage
StatFs.getAvailableBlocksLong ()	Storage
ConnectivityManager	Networking
NetworkCapabilities	Network connection
NetworkInfo	Network connection
SharedPreferences	Display on time
SharedPreferences.Editor	Display on time
TrafficStats	Network traffic

Battery: To monitor the battery, the BatteryManager class has been used, which provides methods to query the device’s battery status. It is specifically used to obtain detailed information such as temperature, charge level, and the current state of the battery (charging, discharging, etc.).
App usage: For capturing app usage statistics, the following libraries have been used within the same class: UsageStatsManager, UsageStatsManager.queryUsageStats, and UsageStats. UsageStatsManager allows access to app usage statistics on the device, obtaining an instance through the getSystemService method. Through queryUsageStats, app usage statistics over specific time intervals are retrieved; in this case, daily usage is queried, obtaining the total time each app has been in the foreground. UsageStats contains information about the usage of each app, including the total time it has been active. Additionally, these libraries will be used later to estimate the CPU usage percentage of the applications.
RAM usage: To manage and monitor RAM usage, the ActivityManager and ActivityManager.MemoryInfo classes were used. ActivityManager provides access to information about the state of system activities, including RAM usage. MemoryInfo allows retrieving details such as the total, available and used memory on the device.
Storage: For storage management, the following classes were used: Environment.getDataDirectory(), StatFs, StatFs.getBlockSizeLong(), StatFs.getBlockCountLong(), and StatFs.getAvailableBlocksLong(). Environment.getDataDirectory() provides a reference to the internal storage data directory, while StatFs is used to obtain file system statistics, such as block size, total blocks, and available blocks. getBlockSizeLong returns the size of each block in bytes, getBlockCountLong returns the total number of blocks in the system, and getAvailableBlocksLong provides the number of available blocks.
Screen on time: To collect data on screen-on time, the SharedPreferences and SharedPreferences.Editor classes were used. SharedPreferences allows persistently storing and retrieving key-value data, such as the total screen-on time. SharedPreferences.Editor is used to modify these values, and apply() is invoked to save the changes.
Network traffic: For collecting network traffic data, the TrafficStats class was used, which provides statistics on network data usage, both for Wi-Fi and mobile networks. The following methods were used:
–
TrafficStats.getTotalRxBytes(): Returns the total number of bytes received across all network interfaces.
–
TrafficStats.getTotalTxBytes(): Returns the total number of bytes sent across all network interfaces.
–
TrafficStats.getMobileRxBytes(): Returns the number of bytes received via the mobile network.
–
TrafficStats.getMobileTxBytes(): Returns the number of bytes sent via the mobile network.

Appendix B.2. Libraries and Metric Collecting

These libraries and classes were used to capture the key metrics, which were stored in a CSV file for further analysis. Additionally, other standard libraries such as Intent, Handler, TextView, and Toast were employed for general application management and functionality. Table A2 below shows all the variables collected grouped by name, measurement, and a brief reason for each variable.

Table A2. Variables collected indicating the unit of measurement and the reason for its selection.

Variable Name	Measurement Unit	Reason for Collecting
Date and time	Timestamp	Measurement control
Battery level	Percentage (%)	Target variable
Charge status	Yes/No	Ability to positively or negatively alter the target variable
Battery temperature	Degrees celsius (º)	Indirectly proportional relationship with the target variable
Applications with the longest usage time (name) ×5	App name	Reference to app name
Applications with the longest usage time (seconds) ×5	Seconds	Indirectly proportional relationship with the target variable
Total RAM	MB	Know how much total RAM the mobile device has
Available RAM	MB	Knowing how much available RAM the mobile device has
RAM used	MB	Indirectly proportional relationship with the target variable
Total storage	GB	Know how much total storage the mobile device has
Available storage	GB	Know how much available storage the mobile device has
Most CPU-intensive applications (name) ×5	App name	Reference to the app name
Applications with highest CPU consumption (percentage) ×5	Percentage (%)	Indirectly proportional relationship with the target variable
Network connection	WiFi or mobile data	Knowing what type of network the mobile device is connected to
Network type	4/5G	If its connection is mobile data, know what type of network it is
Total screen time on	Seconds	Indirectly proportional relationship with the target variable
Total network traffic sent	MB	Knowing whether sending traffic over the network affects the target variable
Total network traffic received	MB	Knowing whether receiving traffic over the network affects the target variable
Mobile network traffic sent	MB	Know whether sending traffic over the mobile network affects more or less the target variable
Mobile network traffic received	MB	Knowing whether sending traffic over the mobile network affects the target variable more or less
Wi-fi network traffic sent	MB	Know whether sending traffic over the Wi-Fi network affects the target variable more or less
W-fi network traffic received	MB	Knowing whether sending traffic over the Wi-Fi network affects the target variable more or less
Device network traffic sent during the run	MB	Knowing whether sending traffic over the network during execution affects more or less the target variable
Device network traffic received during execution	MB	Knowing whether receiving network traffic during execution has a greater or lesser effect on the target variable
Age	Integer	Currently, depending on age, cell phones are used more or less, thus affecting the objective variable.
Gender	String	According to studies mentioned later, women use cell phones more than men, thus affecting the target variable.

References

Yus, F. Smartphone Communication: Interactions in the App Ecosystem; Routledge: London, UK, 2021. [Google Scholar]
Cordella, M.; Alfieri, F.; Clemm, C.; Berwald, A. Durability of smartphones: A technical analysis of reliability and repairability aspects. J. Clean. Prod. 2021, 286, 125388. [Google Scholar] [CrossRef] [PubMed]
Fan, Y.; Yang, C. Competition, product proliferation, and welfare: A study of the US smartphone market. Am. Econ. J. Microecon. 2020, 12, 99–134. [Google Scholar] [CrossRef]
Proske, M.; Poppe, E.; Jaeger-Erben, M. The Smartphone Evolution—An Analysis of the Design Evolution and Environmental Impact of Smartphones; Fraunhofer-Institut für Zuverlässigkeit und Mikrointegration: Berlin, Germany, 2020. [Google Scholar]
Rashid, A.; Zeb, M.A.; Rashid, A.; Anwar, S.; Joaquim, F.; Halim, Z. Conceptualization of smartphone usage and feature preferences among various demographics. Clust. Comput. 2020, 23, 1855–1873. [Google Scholar] [CrossRef]
Varriale, V.; Cammarano, A.; Michelino, F.; Caputo, M. Knowledge management in high-tech products and customer satisfaction: The smartphone industry. J. Open Innov. Technol. Mark. Complex. 2023, 9, 100012. [Google Scholar] [CrossRef]
Dash, P.; Hu, Y.C. How much battery does dark mode save? an accurate oled display power profiler for modern smartphones. In In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services, Madrid, Spain, 16–17 November 2021; pp. 323–335. [Google Scholar]
Pramanik, P.K.D.; Sinhababu, N.; Mukherjee, B.; Padmanaban, S.; Maity, A.; Upadhyaya, B.K.; Holm-Nielsen, J.B.; Choudhury, P. Power consumption analysis, measurement, management, and issues: A state-of-the-art review of smartphone battery and energy usage. IEEE Access 2019, 7, 182113–182172. [Google Scholar] [CrossRef]
Bhattacharya, S.; Maddikunta, P.K.R.; Meenakshisundaram, I.; Gadekallu, T.R.; Sharma, S.; Alkahtani, M. Deep Neural Networks Based Approach for Battery Life Prediction. Comput. Mater. Contin. 2021, 69, 2599–2615. [Google Scholar] [CrossRef]
Shamsa, E.; Pröbstl, A.; TaheriNejad, N.; Kanduri, A.; Chakraborty, S.; Rahmani, A.M.; Liljeberg, P. Ubar: User-and battery-aware resource management for smartphones. ACM Trans. Embed. Comput. Syst. (TECS) 2021, 20, 1–25. [Google Scholar] [CrossRef]
Singh, M.; Trivedi, J.; Maan, P.; Goyal, J. Smartphone Battery State-of-Charge (SoC) Estimation and battery lifetime prediction: State-of-art review. In Proceedings of the 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 29–31 January 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 94–101. [Google Scholar]
Lombardo, T.; Duquesnoy, M.; El-Bouysidy, H.; Årén, F.; Gallo-Bueno, A.; Jørgensen, P.B.; Bhowmik, A.; Demortière, A.; Ayerbe, E.; Alcaide, F.; et al. Artificial intelligence applied to battery research: Hype or reality? Chem. Rev. 2021, 122, 10899–10969. [Google Scholar] [CrossRef]
Google. Profile Battery Usage with Batterystats and Battery Historian. 2024. Available online: https://developer.android.com/topic/performance/power/setup-battery-historian (accessed on 3 October 2024).
Parasuraman, S. Nomophobia may increase the risk of anxiety, depression, and social isolation. Free. Radicals Antioxid. 2023, 13, 1–2. [Google Scholar] [CrossRef]
Miller, D.; Abed Rabho, L.; Awondo, P.; de Vries, M.; Duque, M.; Garvey, P.; Haapio-Kirk, L.; Hawkins, C.; Otaegui, A.; Walton, S.; et al. The Global Smartphone: Beyond a Youth Technology; UCL Press: London, UK, 2021. [Google Scholar]
Howarth, J. How Many People Own Smartphones? (2024–2029). 2025. Available online: https://explodingtopics.com/blog/smartphone-stats (accessed on 26 September 2024).
Li, T.; Xia, T.; Wang, H.; Tu, Z.; Tarkoma, S.; Han, Z.; Hui, P. Smartphone app usage analysis: Datasets, methods, and applications. IEEE Commun. Surv. Tutor. 2022, 24, 937–966. [Google Scholar] [CrossRef]
Alia, P.A.; S ST, M.; Prayogo, J.S.; Kriswibowo, R.; Kom, S.; Kom, M. Implementation Open Artificial Intelligence ChattGPT Integrated with Whatsapp Bot. Adv. Sustain. Sci. Eng. Technol. (ASSET) 2024, 6, 02401019-01. [Google Scholar] [CrossRef]
Gomez-Uribe, C.A.; Hunt, N. The netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Manag. Inf. Syst. (TMIS) 2015, 6, 1–19. [Google Scholar] [CrossRef]
Watson, A. Deep Learning Techniques for Super-Resolution in Video Games. arXiv 2020, arXiv:2012.09810. [Google Scholar]
Xu, Z.; Zhang, Y.; Andrew, G.; Choquette-Choo, C.A.; Kairouz, P.; McMahan, H.B.; Rosenstock, J.; Zhang, Y. Federated Learning of Gboard Language Models with Differential Privacy. arXiv 2023, arXiv:2305.18465. [Google Scholar]
Mackey, T.K.; Li, J.; Purushothaman, V.; Nali, M.; Shah, N.; Bardier, C.; Cai, M.; Liang, B. Big data, natural language processing, and deep learning to detect and characterize illicit COVID-19 product sales: Infoveillance study on Twitter and Instagram. JMIR Public Health Surveill. 2020, 6, e20794. [Google Scholar] [CrossRef] [PubMed]
Flores-Martin, D.; Laso, S.; Berrocal, J.; Murillo, J.M. Towards digital health: Integrating federated learning and crowdsensing through the Contigo app. SoftwareX 2024, 28, 101885. [Google Scholar] [CrossRef]
Liciotti, D.; Bernardini, M.; Romeo, L.; Frontoni, E. A sequential deep learning application for recognising human activities in smart homes. Neurocomputing 2020, 396, 501–513. [Google Scholar] [CrossRef]
Kothari, P.; Kreiss, S.; Alahi, A. Human trajectory forecasting in crowds: A deep learning perspective. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7386–7400. [Google Scholar] [CrossRef]
Kaluarachchi, T.; Reis, A.; Nanayakkara, S. A review of recent deep learning approaches in human-centered machine learning. Sensors 2021, 21, 2514. [Google Scholar] [CrossRef]
Su, Y.; Wang, C.; Sun, X. Lightweight deep learning model for marketing strategy optimization and characteristic analysis. Comput. Intell. Neurosci. 2022, 2022, 2429748. [Google Scholar] [CrossRef]
Li, A.W.; Bastos, G.S. Stock market forecasting using deep learning and technical analysis: A systematic review. IEEE Access 2020, 8, 185232–185242. [Google Scholar] [CrossRef]
Ferrari, A.; Micucci, D.; Mobilio, M.; Napoletano, P. Deep learning and model personalization in sensor-based human activity recognition. J. Reliab. Intell. Environ. 2023, 9, 27–39. [Google Scholar] [CrossRef]
Google. Android Debug Bridge (adb). Available online: https://developer.android.com/tools/adb (accessed on 20 November 2024).
Digibites. Accu Battery. Available online: https://play.google.com/store/apps/details?id=com.digibites.accubattery (accessed on 20 November 2024).
One. BatteryOne: Battery. Available online: https://play.google.com/store/apps/details?id=com.oneapps.batteryone (accessed on 20 November 2024).
Paget96. Battery Guru. Available online: https://play.google.com/store/apps/details?id=com.paget96.batteryguru (accessed on 20 November 2024).
Google. Digital Wellbeing. Available online: https://play.google.com/store/apps/details?id=com.google.android.apps.wellbeing (accessed on 20 November 2024).
Hochreiter, S. Long Short-Term Memory; Neural Computation MIT-Press: Cambridge, MA, USA, 1997. [Google Scholar]
Hahnloser, R.H.; Sarpeshkar, R.; Mahowald, M.A.; Douglas, R.J.; Seung, H.S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 2000, 405, 947–951. [Google Scholar] [CrossRef] [PubMed]
Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. [Google Scholar]
Rentero-Trejo, R.; Flores-Martín, D.; Galán-Jiménez, J.; García-Alonso, J.; Murillo, J.M.; Berrocal, J. Using federated learning to achieve proactive context-aware IoT environments. J. Web Eng. 2022, 21, 53–74. [Google Scholar] [CrossRef]
Xie, Y.; Jin, M.; Zou, Z.; Xu, G.; Feng, D.; Liu, W.; Long, D.D.E. Real-Time Prediction of Docker Container Resource Load Based on a Hybrid Model of ARIMA and Triple Exponential Smoothing. IEEE Trans. Cloud Comput. 2022, 10, 1386–1401. [Google Scholar] [CrossRef]
Tang, X.; Liu, Q.; Dong, Y.; Han, J.; Zhang, Z. Fisher: An efficient container load prediction model with deep neural network in clouds. In Proceedings of the 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, Australia, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 199–206. [Google Scholar]
Ding, M.; Wang, T.; Wang, X. Establishing smartphone user behavior model based on energy consumption data. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 16, 1–40. [Google Scholar] [CrossRef]
Neto, A.S.B.; Farias, F.; Mialaret, M.A.T.; Cartaxo, B.; Lima, P.A.; Maciel, P. Building energy consumption models based on smartphone user’s usage patterns. Knowl.-Based Syst. 2021, 213, 106680. [Google Scholar] [CrossRef]
Xia, T.; Li, Y.; Feng, J.; Jin, D.; Zhang, Q.; Luo, H.; Liao, Q. DeepApp: Predicting personalized smartphone app usage via context-aware multi-task learning. ACM Trans. Intell. Syst. Technol. (TIST) 2020, 11, 1–12. [Google Scholar] [CrossRef]
Çiçek, E.; Gören, S. Smartphone power management based on ConvLSTM model. Neural Comput. Appl. 2021, 33, 8017–8029. [Google Scholar] [CrossRef]

Figure 1. Android application flowchart. It describes the information flow of the data collection application from the time the user installs it until its data are generated and transferred to a server to generate his/her model.

Figure 2. Correlation matrix example for OnePlus LE2123. It indicates how variables affect each other. It is useful to identify redundancies, select relevant features, and avoid multicollinearity problems in the models.

Figure 3. Network traffic variables analysis and the correlation among them.

Figure 4. DNN predictions for OnePlus_LE2113 considering the predictions and real values.

Figure 5. LSTM predictions for OnePlus_LE2113 considering the predictions and real values.

Figure 6. OnePlus_LE2113 results comparison. (a) DNN, LSTM, linear regression and decision tree data results. (b) DNN and LSTM loss for training and validation.

Figure 7. DNN predictions for OnePlus_LE2123 considering the predictions and real values.

Figure 8. LSTM predictions for OnePlus_LE2123 considering the predictions and real values.

Figure 9. OnePlus_LE2123 results comparison.

Figure 10. DNN predictions for Samsung_SM-A226B considering the predictions and real values.

Figure 11. Samsung_SM-A226B results comparison.

Figure 12. LSTM predictions for Samsung_SM-A226B considering the predictions and real values.

Figure 13. DNN predictions for POCO_M2102J20SG considering the predictions and real values.

Figure 14. LSTM predictions for POCO_M2102J20SG considering the predictions and real values.

Figure 15. POCO_M2102J20SG results comparison.

Figure 16. DNN predictions for Samsung_SM-G991B considering the predictions and real values.

Figure 17. LSTM predictions for Samsung_SM-G991B considering the predictions and real values.

Figure 18. Samsung_SM-G991B results comparison.

Table 1. CSV example (1/2) detailing some selected variables such as battery level, temperature, or the contemplated application.

Datetime	Battery Level (%)	Charge Status	Battery Temperature (Degrees Celsius)	Applications with the Highest Usage Time (Name)	Applications with the Highest Usage Time (Seconds)
16/05/2024 13:00	70	no	29.9	org.telegram.messenger	189
16/05/2024 13:29	66	no	29.3	com.whatsapp	236
16/05/2024 13:34	66	no	29.8	com.whatsapp	236
16/05/2024 13:49	65	no	28.2	com.whatsapp	266
16/05/2024 13:50	65	no	28.8	com.whatsapp	295
16/05/2024 14:10	64	no	28.2	com.whatsapp	295
16/05/2024 14:10	64	no	28.2	com.whatsapp	295
16/05/2024 14:20	63	no	28.1	com.whatsapp	295
16/05/2024 14:30	63	no	28.2	com.whatsapp	295
16/05/2024 14:40	62	no	28.1	com.whatsapp	295
16/05/2024 14:50	62	no	28.1	com.whatsapp	295

Table 2. CSV example (2/2) detailing some selected variables such as RAM usage, storage, network traffic, or screen active time.

Total RAM Memory (MB)	Available RAM Memory (MB)	Used RAM Memory (MB)	Total Storage (GB)	Available Storage (GB)	Used Storage (GB)	Network Connection	Total Screen-on Time (Seconds)	Total Network Traffic Received (MB)	Total Network Traffic Sent (MB)
11,243	5822	5421	221.28	101.19	120.09	wi-fi	77	0	0
11,243	4992	6251	221.28	99.2	122.08	wi-fi	333	23	6
11,243	5140	6103	221.28	99.2	122.08	wi-fi	333	23	6
11,243	5004	6239	221.28	99.15	122.13	wi-fi	333	24	7
11,243	4781	6462	221.28	99.01	122.27	wi-fi	333	30	7
11,243	5058	6185	221.28	99.07	122.21	wi-fi	333	31	7
11,243	5057	6186	221.28	99.07	122.21	wi-fi	333	31	7
11,243	5045	6198	221.28	99.07	122.21	wi-fi	333	31	7
11,243	5039	6204	221.28	99.07	122.21	wi-fi	333	32	7
11,243	4982	6261	221.28	99.07	122.21	wi-fi	333	32	7
11,243	4633	6610	221.28	99.07	122.21	wi-fi	333	33	8

Table 3. Data collector application battery consumption.

Metric	Value
Device estimated power use	0.07%
Foreground activity	4 times over 25 h 1 m 6 s 880 ms
CPU user time	24 s 360 ms

Table 4. Multivariate analysis of the battery level.

	Coefficient	Std. Error	p-Value
Charge status	9.340	1.879	$7.3 \times 10^{- 7}$
Battery temperature	−0.826	0.186	$9.5 \times 10^{- 6}$
Applications with the highest usage time name (1)	0.242	16.404	0.0499
Applications with the highest usage time seconds (1)	$0.7 \times 10^{- 3}$	0.000	$1.9 \times 10^{- 4}$
Applications with the highest usage time name (2)	1.439	0.709	0.042
Applications with the highest usage time seconds (2)	−0.006	0.001	$6.28 \times 10^{- 8}$
Applications with the highest usage time name (3)	1.433	0.621	0.021
Applications with the highest usage time seconds (3)	−0.023	0.003	$2.8 \times 10^{- 14}$
Applications with the highest usage time name (4)	−0.276	0.453	0.050
Applications with the highest usage time seconds (4)	0.009	0.007	0.019
Applications with the highest usage time name (5)	−0.559	0.235	0.017
Applications with the highest usage time seconds (5)	0.009	0.008	$7.8 \times 10^{- 4}$
RAM used	0.003	0.001	0.074
Storage used	−0.336	0.100	0.083
Applications with the highest CPU consumption name (1)	0.105	16.394	0.099
Applications with the highest CPU consumption percentage (1)	0.013	0.092	0.088
Applications with the highest CPU consumption name (2)	−1.316	0.700	0.060
Applications with the highest CPU consumption percentage (2)	0.666	0.126	$1.4 \times 10^{- 7}$
Applications with the highest CPU consumption name (3)	−0.802	0.616	0.109
Applications with the highest CPU consumption percentage (3)	1.131	0.259	$1.3 \times 10^{- 5}$
Applications with the highest CPU consumption name (4)	0.250	0.455	0.058
Applications with the highest CPU consumption percentage (4)	−0.699	0.432	0.011
Applications with the highest CPU consumption name (5)	0.475	0.233	0.042
Applications with the highest CPU consumption percentage (5)	−2.195	0.056	$8.6 \times 10^{- 5}$
Network connection	29.301	18.203	0.061
Network type	−13.121	7.846	0.095
Total screen time on	−0.006	0.003	0.031
Total network traffic received	− $9.1 \times 10^{- 5}$	$9.3 \times 10^{- 5}$	0.033
Total network traffic sent	$0.5 \times 10^{- 3}$	0.001	0.049
Age	−0.2553	0.214	0.233
Gender	$2.3 \times 10^{- 52}$	$1.1 \times 10^{- 40}$	0.999

Table 5. DNN network structure detailing the numbers of neurons, the activation function, and the regulation selected.

DNN Network Structure
Layer	Neurons	Activation	Regularization
Input DENSE layer	17	ReLU
First DENSE layer	128	ReLU	L2 (0.001)
Second DENSE layer	64	ReLU	L2 (0.001)
Third DENSE layer	32	ReLU	L2 (0.001)
Output DENSE layer	1		L2 (0.001)

Table 6. LSTM network structure detailing the numbers of neurons, the activation function, and the regulation selected.

LSTM Network Structure
Layer	Neurons	Actiation	Regularization
Input dense layer	17	ReLU
First LSTM layer	128	ReLU	L2 (0.001)
Second dense layer	64	ReLU	L2 (0.001)
Third dense layer	32	ReLU	L2 (0.001)
Output dense layer	1		L2 (0.001)

Table 7. Mobile devices features.

	Manufacturer	Source	Release	Android	Processor	RAM (GB)	Storage (GB)
OnePlus_LE2113	OnePlus	Cáceres (Spain)	2021	14.0	Qualcomm Snapdragon 888	8	128
OnePlus_LE2123	OnePlus	Cáceres (Spain)	2021	14.0	Qualcomm Snapdragon 888	12	256
Samsung_SM-A226B	Samsung	Cáceres (Spain)	2021	10.0	MediaTek Dimensity 700	4	64
POCO_M2102J20SG	Pocophone	Cáceres (Spain)	2021	11.0	Qualcomm Snapdragon 860	8	256
Samsung_SM-G991B	Samsung	Cáceres (Spain)	2021	9.0	Exynos 2100	8	128

Table 8. OnePlus_LE2113 monthly apps frequency and usage time.

App	Freq	Usage
com.instagram.android	375	33 h 4 min
tv.twitch.android.app	30	11 h 22 min
com.whatsapp	30	10 h 20 min
org.telegram.messenger	286	8 h 47 min
com.twitter.android	255	7 h 45 min

Table 9. OnePlus_LE2123 monthly apps frequency and usage time.

App	Freq	Usage
com.google.android.youtube	209	112 h 7 min
com.zhiliaoapp.musically	223	18 h 5 min
com.google.android.apps.maps	160	12 h 55 min
com.android.launcher	109	11 h 22 min
com.ryanair.cheapflights	62	8 h 47 min

Table 10. Samsung_SM-G991B 15-day apps frequency and usage time.

App	Freq	Usage
com.instagram.android	95	18 h 15 min
com.whatsapp	107	7 h 45 min
com.android.chrome	62	3 h 45 min
com.linkedin.android	22	3 h 45 min
com.twitter.android	46	1 h 45 min

Table 11. POCO_M2102J20SG monthly apps frequency and usage time.

App	Freq	Usage
com.supercell.brawlstars	187	27 h 23 min
com.zhiliaoapp.musically	190	19 h 38 min
com.supercell.clashroyale	182	17 h 34 min
com.whatsapp	337	15 h 30 min
com.instagram.android	351	13 h 26 min

Table 12. Samsung_SM-G991B monthly apps frequency and usage time.

App	Freq	Usage
com.instagram.android	387	25 h 50 min
com.whatsapp	496	19 h 38 min
com.samsung.android.incallui	63	17 h 34 min
tv.twitch.android.app	39	17 h 3 min
com.google.android.youtube	203	10 h 20 min

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Flores-Martin, D.; Laso, S.; Herrera, J.L. Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior. Electronics 2024, 13, 4897. https://doi.org/10.3390/electronics13244897

AMA Style

Flores-Martin D, Laso S, Herrera JL. Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior. Electronics. 2024; 13(24):4897. https://doi.org/10.3390/electronics13244897

Chicago/Turabian Style

Flores-Martin, Daniel, Sergio Laso, and Juan Luis Herrera. 2024. "Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior" Electronics 13, no. 24: 4897. https://doi.org/10.3390/electronics13244897

APA Style

Flores-Martin, D., Laso, S., & Herrera, J. L. (2024). Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior. Electronics, 13(24), 4897. https://doi.org/10.3390/electronics13244897

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior

Abstract

1. Introduction

2. Background and Motivations

3. Deployment and Implementation

3.1. Data Collecting

3.2. Data Processing

3.2.1. Data Preprocessing

First Step: Handling NaN Values

Second Step: Processing String Variables

Third Step: Generating Unique Application Mappings

3.2.2. Variables Analysis

Influence on the Target Variable

3.2.3. Data Tuning

Data Settings for Deep Neural Network (DNN)

Data Settings for Long Short-Term Memory (LSTM)

3.2.4. Model Definition

DNN Network Definition

LSTM Network Definition

3.2.5. Model Training

3.2.6. Model Evaluation

4. Results

4.1. Setup

4.2. Results Analysis

4.2.1. Device 1: OnePlus_LE2113

4.2.2. Device 2: OnePlus_LE2123

4.2.3. Device 3: Samsung_SM-A226B

4.2.4. Device 4: POCO_M2102J20SG

4.2.5. Device 5: Samsung_SM-G991B

5. Discussion

6. Related Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Implementation Code

Appendix A.1. Data Preprocessing—First Step

Appendix A.2. Data Preprocessing—Second Step

Appendix A.3. Data Preprocessing—Third Step

Appendix A.4. Data Settings for DNN

Appendix A.5. DNN Model Definition

Appendix A.6. LSTM Model Definition

Appendix A.7. Model Training

Appendix A.8. Model Prediction

Appendix A.9. Model Evaluation

Appendix A.10. Model Accuracy Calculation

Appendix B. Libraries Leveraged in the Mobile Application

Appendix B.1. List of Libraries

Appendix B.2. Libraries and Metric Collecting

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI