1. Introduction
Dew computing is conceived as the bottom structural layer of the existing distributed computing hierarchy [
1] integrated by the so called cloud-edge/fog computing services. In such a hierarchy, the most powerful and expensive storage and computation resources reside in cloud data centers. Hence, cloud services are worldwide accessible, in contrast to services offered in the next layer represented by edge/fog computing infrastructure. The latter is expected to provide services for satisfying storage and computing needs within a metropolitan area. In the last decade, many research efforts has been conducted in relation to edge/fog computing paradigm [
2,
3,
4]. The high latency internetwork communication that involves using distant cloud services imposes a limitation for running applications with delay-sensitive processing requirements such as real-time analytics. In response to such limitation edge/fog computing solutions emerged and bring computing and storing resources closer to where computing needs are generated [
5]. In this way, long communication latencies are mitigated by reducing the usage of resources from distant cloud data centers. However, when edge/fog resources are overloaded, i.e., when their capability is overpassed to handle peaks of high demand applications, the layer relies on its connection with the cloud resources to offload tasks. When this happens, the edge/fog layer exposes its dependency on internetwork back-haul. As a complementary bottom layer of such hierarchical resources organization is the dew layer, which aims at enhancing user experience by exploiting resources near to the end-user with minimum internet access [
6]. Intuitively, resources in this layer are more limited, though accessible with lower latency than those of the above layers, and users scope is within a local area network or even a user’s local machine. A key aspect to provide dew layer services is to have dew devices capable of satisfying, at least in a degraded mode, user requirements and complete autonomy to decide when to use services provided by above layers [
7].
Figure 1 illustrates the users scope, resources and communication features of such complementary distributed computing paradigms.
In its beginnings, dew computing related to maintaining copies of websites and databases (dew sites) in user’s local computers (dew servers) to support an internet-free web surfing experience [
8,
9]. Dew servers collaborate with the cloud for performing tasks such as synchronization, restoring and personalization of dew sites content located in dew computing devices. These last are represented by human-operated computers like desktop computers, laptops and mobile devices including tablets and smartphones [
6]. Recent conceptualizations of dew computing envisage an enriched functionality for the layer that goes beyond exploiting the storing/communicating capabilities of dew devices. In [
6], the importance of an active cooperation among nearby dew devices for providing tasks execution services is highlighted. In [
10], cooperation of such kinds are proposed for IoT data streams processing. The personal smart traffic lights prototype presented in [
11] is a concrete example of application in the intelligent transportation domain. Moving towards processing data by prioritizing the exploitation of on-device/local network resources aims to improve the energy efficiency, trust, security, resilience to network outages, and response time properties of application execution approaches that primarily rely on remote cloud services [
12].
Dew computing area implementations have unlimited applicability in indoor/outdoor ordinary-life scenarios, including home automation, smart gardening, green computation, smart health care, to mention a few examples. The integration of mobile devices as first-class computing providers of dew environments sounds like a rule more than the exceptional case for several reasons: (1) these are one of the most common computational device in the world. Their number increases year by year to the point that by 2021 it is estimated that there will be 1.5 mobile-connected devices per capita (
http://tinyurl.com/mokcut3). (2) Their computing, storage and sensing capabilities are enhanced with every new generation. Smartphones are nowadays capable of running resource intensive routines, e.g., for processing medical images [
13] or performing complex engineering equations [
14]. (3) They share a close proximity to the end user. (4) They are able to continuously operate for several hours, days, even weeks unplugged from the electricity grid, i.e., only with the power of their Li-ion batteries.
In line with the described potential, there are many efforts which advocate making mobile devices cooperate in task execution [
15]. To accomplish this, it is necessary to design task scheduling mechanisms, i.e., the logic by which tasks or data are distributed among participant battery-driven mobile nodes to be executed or processed, respectively. Task scheduling design targeting resource scavenging of a group of nearby mobile devices encloses high complexity due to the highly dynamic and heterogeneous nature of resource availability [
15,
16]. In other words, resource quantification used in tasks scheduling mechanisms is difficult to tackle effectively; mobile devices can change their physical location, they are non-dedicated by nature (dew tasks and device owner’s applications must compete for the same resources) and have energy limitations due to the fact that mobile devices are battery-powered. Particularly, quantifying resources of a device based on its future energy availability is of utmost importance, not only to avoid incomplete task assignments caused by run out of battery events, but also for encouraging dew device owners to participate in dew computing considering at the same time the battery requirements of his/her future interactions.
In this context, we propose a model to quantify future energy availability in mobile devices by taking into account the device owner usage profile as input, which is an open problem seldom explored in the literature [
17]. To limit the scope of our research, we concentrate on producing a battery prediction model suitable for prospective designs of practical energy-aware schedulers which aim at scavenging mobile devices computing capabilities.
Basically, our proposed model utilizes information of past owner’s activity (charging state, screen state and brightness level, application execution, activated radios, etc.) to produce ahead predictions of remaining battery within a time window of several hours. For the model construction and the assessment of its predictive accuracy, we employed real traces of mobile device activity from the device analyzer data-set [
18]. At present, this is the largest fine-grained mobile device usage data-set including traces from more than 31,000 users worldwide and different Android-based mobile device brands. By basing on this data-set, comparisons against a related approach published in [
19] shows the superior performance of our model over 23 activity traces belonging to different mobile device users. This represents the major contribution of this paper w.r.t. our conference version presented at ITNG 2018 [
20], where only one activity log was used to illustrate the proposed model. Other difference is the sampling range time used in the evaluation. While in the conference version the models were run for a single activity trace user over three months, in this version we included activity traces of 23 users for time ranges comprised between 13 and 25 months. Furthermore, contextual information of the applicability of the approach is presented in a new related works section.
The organization of the paper is as follows.
Section 2 discusses relevant works related to mobile devices resource quantification, in the context of tasks scheduling mechanisms. These last constitute a vital component for exploiting the computing potential present in a dew cluster.
Section 3 explains our model to predict energy availability in detail. Then,
Section 4 evaluates the accuracy of the model in terms of the mean square root error metric, considering activity logs from several users form the device analyzer data-set and an alternative model from the literature [
19]. Finally,
Section 5 summarizes the implications of our findings and delineates future works and improvements.
3. Approach
Before diving into the details of our battery prediction model based on device usage patterns, let us first motivate, with two ordinary-life situations, the value of having a dew computing solution for hour-wise predictions of energy availability. Secondly, we illustrate how the model would be part of a dew computing architecture.
Drew and Jonathan own a company that offers their customers a service consisting in finding houses to buy and remodel. Every time a couple requests their services, Drew and Jonathan select a list of potential houses based on pre-requisites (budget, dimensions, proximity to desired places, and so on) and make an appointment with the couple to visit these houses. Drew and Jonathan employ an on-site software that is capable of interactively rendering remodeling options via augmented reality (AR), so a rich model is built as the camera of a smartphone films different parts and rooms of a house. The software is computationally expensive, and thus it might take advantage of nearby computing devices (smartphones) to interactively build the AR model. As houses being visited often have no electricity or internet connection, smartphones battery time must be both used wisely and collectively scavenged.
Under this scenario, it is desirable that if Drew and Jonathan will meet a couple at time t today, the available battery lifetime of both their smartphones and that of the target couple are predicted in advance based on how each user employs their device. This way, not only the available computing power to use the AR software at time t is known beforehand, but also the various smartphones are wisely assigned parallel tasks upon rendering remodeling options, so the remaining battery after all planned houses are visited is affected evenly among the participating smartphones. For obvious reasons, the dew scheduler however might decide to put more burden—in terms of assigned task to execute—on Drew and Jonathan’s mobile phones rather than on those of their clients.
Let us consider another relevant situation. A tourism company offers customers a trekking service on beautiful but isolated geographic places across a country. Expeditions to a particular place depart at predefined days and hours (e.g., t) from a fixed meeting point, and the time required to reach a place from the meeting point is known beforehand (e.g., r hours). Again, internet connectivity in such places is absent. Moreover, the company provides customers with a novel AR software that allows users to augment the scene with information to enhance the trekking experience; as the user points its smartphone to local vegetation, the software uses image recognition algorithms and images with information and interactive animations regarding e.g., in which animals’ food chain a specific vegetation is. Similar to the previous situation, the dew scheduler associated to the software might predict battery lifetime in participating smartphones at on a specific day, and hence distribute parallel tasks based on this information. In this case, the scheduler might also want to put the computing burden on some smartphones, while saving battery on few others due to emergency calls that might need to be issued.
Figure 2 illustrates asynchronous interactions in time between components of a dew-cloud architecture instantiated for our approach. From time
t to
, a synchronization/training phase takes place, while
moment corresponds to dew devices cooperating, i.e., a dew computing phase. Concretely, at time
t, a dew computing device synchronizes with the cloud and the last registered usage pattern data chunk is derived from past owner activities. Chunks data are relevant features that we will detail next in this section. Due to limited storage capacity, the dew device might be configured to save up to a certain number of past chunks, then, a synchronization operation allow the device to erase old chunks. Cloud servers, however, have a large capacity to store many chunks, i.e., long periods of owner activity which would be used to (re)train an individual battery prediction model for the dew device owner. At time
, while the dew computing device continues registering mobile device owner activity locally, the cloud trains a battery prediction model. At
, upon a new dew device-cloud synchronization, the dew device gets a trained battery model from the cloud.
Now, let us suppose the above component’s interaction can be replicated by any number of dew devices simultaneously. Besides, the training result of each dew device prediction model could be different, in the sense that one could generalize better the battery behavior than others. Finally, at , is represented dew computing scenarios when a cluster of dew devices are required to cooperate, for instance, for executing tasks of a compute intensive application like the augmented reality scenarios described above. Cooperation is performed with future battery information obtained e.g., via a micro service running in the dew device, which invokes an instance of our battery prediction model.
Having said so, in subsequent sections, we focus on explaining the underpinnings of the proposed battery prediction model, which is the focus of this paper. In summary, our approach includes an ensemble of several Machine Learning algorithms trained using features resulting from a feature selection analysis performed over a subset of user activity events. Activity events derived from traces of real mobile device users logs. Features belong to different categories including energy (e.g., when the user plugs/unplugs the mobile phone from a wall socket or USB), network (e.g., Wifi scanning/connection/disconnection events) and application (e.g., screen on/off, launch/close an application).
3.1. Preliminary Data Analysis and Feature Selection
In machine learning, feature selection is the process of defining a subset of relevant features (variables, a.k.a predictors) for use in a model construction. Feature selection pursues two goals, namely reducing the number of features to avoid overfitting/improve the generalization of models built, and gaining better understanding of the features relationship to the response variables [
28]. To come up with a feature selection approach, general enough in the problem domain at hand and built based on real user data at the same time, we utilize the Device Analyzer data-set [
18]. This is provided by the University of Cambridge, and represents to date the largest collection of activity traces from real Android-powered mobile device users. At the time of writing this paper, this data-set has 31,455 contributors worldwide.
For the sake of illustrating the feature selection approach, values shown below correspond to a single user from the device analyzer data-set, who has registered activity traces for over 6 weeks. In order to ease the analysis of the user activity data, which basically consists in per-user files with registers composed by an id field, milliseconds, a time-stamp, a data field typically represented by a data category, subcategories and values associated, we firstly defined a structured format for it. To this end, the data-set was split into states, where each state has the minute-wise value for each mobile sensor in a device. A mobile device state changes as a result of an event in time that changes the value of any sensor. For example, a change in the battery level can be defined as an event, which triggers a new state of the mobile device in which the battery level is modified and all other features remain the same. Initially, from the formatted data, the following combination of features were considered:
Day of week: with values from 0 (Sunday) to 6 (Saturday) indicates the week day where the event was registered. Type: integer.
Minute: with values from 1 to 1440 that represent the minute of the day in which the event occurred. Type: integer.
External supply: Takes a 1/0 value and indicates whether the device is plugged to an external energy supply, e.g., AC adapter or USB connection, or not respectively. Type: Boolean.
Brightness level: with values from 0 to 100 indicates the screen brightness percentage intensity. Type: integer.
Screen on/off: takes a 1/0 value and indicates whether the device screen is active or inactive respectively. Type: Boolean.
Connected: takes a 1/0 value indicating whether the device is connected to a 3G/4G network or not respectively. Type: Boolean.
Connected to Wifi: takes a 1/0 value indicating whether the device is connected to a Wifi network or not respectively. Type: Boolean.
Temperature: indicates battery temperature. Type: Integer.
Voltage: indicates battery voltage. Type: integer.
Battery level: with values from 0 to 100, indicates remaining battery level. Type: integer.
The data-set itself contains several other features, such as those related to location and application usage.
Figure 3 shows a raw extract of few registers from an activity trace.
Nevertheless, these features were not considered as relevant to the battery level modeling as the ones previously listed. First of all, it is true that by knowing the location of users in each state, it might be possible to infer if they are at home. If that is the case, the probability of charging the mobile phone increases. However, that information can also be obtained from the charging pattern using the energy supply feature. Secondly, application usage is highly seasonal, because some applications tend to be used in a certain period of time (e.g., flight-support or tourism applications). In addition, users might also change their applications frequently due to e.g., updates or replacements. Therefore, it is very difficult to generalize a model based on this feature, and it might incur in extra noise.
Table 1 shows an extract from the formatted data-set for one particular mobile device user, i.e., the one used in [
20] to illustrate the model. Furthermore,
Figure 4 depicts the battery level variation along time for this user (first 5 days of the sample). It is possible to see that the resulting curve is not exactly periodic since it takes different shapes, and it does not have a pre-set behavior. However, it is worth noting that there is a visual resemblance to a sinusoidal curve, because it continuously goes up and down. In fact,
Figure 5 (left) depicts a greater resemblance when the battery level is averaged per day. Considering this observation, which applied to many other users in the data-set as well upon producing preliminary visual representations of battery usage, a new feature representing the sinus movement was added to the feature list:
where amplitude describes how much the curve goes up and down,
minute is the minute of the day,
minsPerDay is the total number of minutes in a day (i.e., 1440), and
batteryLevelMean is the average battery level per day. The
batteryLevelMean is the mean per day battery level in the data-set. In addition, the maximum amplitude is 50, since a bigger value makes the curve going above the maximum battery level or below the minimum battery level. Finally, it is worth pointing out that the only variable used to calculate the sinus is minute, which is easily accessible in every mobile device.
Figure 5 (right) shows the resulting curve compared to the real one (in the left) we previously obtained using real battery level samples.
Likewise, as
Figure 4 shows, during night hours battery level tends to go up, which means that the device remains plugged to an energy supply. Such battery level pattern, i.e., the zenith at night when the device is usually connected, and the decrement during daytime caused by the device usage can be modeled with a cosine function. Concretely, based on this reasoning, it was included the extra feature which appears in Equation (
2):
In this case the amplitude has a maximum value of 0.5, because the energy supply can only take values 0 or 1. Then,
batteryExternalMean is the average of all the external supply feature values in the user trace data.
Figure 6 (right) shows the curve obtained from this calculation, and
Figure 6 (left) shows the real averaged curve.
In addition, to capture the battery variation (derivative) from event to event, we added a previous battery level feature to the feature set, given by the battery level in the previous sampling time-step.
Table 2 outlines descriptive information of all user activity traces studied in this work whose formatted data was preprocessed through the procedure described above. Particularly, third, fourth and fifth columns show that heterogeneity is present in aspects such as device brand and model, sampling time range, and amount of registers resulting from the preprocessing of the formatted data.
4. Evaluation
In this section, we describe the experiments performed to evaluate the model proposed in the previous section. Specifically, instances of the estimation model is trained using the activity traces of 23 users, and then, the resulting model for each user is compared to Kang et al. method, a similar approach from the literature [
19] operating on the same training data. To assess the prediction results and compare the approaches we used the Mean Squared Error metric, which is defined as:
Another alternative metric we could had employed is MAE (mean absolute error), but in this case we want to penalize more those estimations that are further away from the real mean value, such as wrongly estimating that the battery level would be much below or over the real battery level. Note that in the ITNG 2018 version of this paper [
20], we evaluated our approach using traces from a single user only.
With respect to the Kang et al. method proposed in [
19], is based on predicting the battery level via an approach that defines all the possible combinations of sensor states, such as energy supply connected and Wifi connected, energy supply connected and Wifi disconnected, and so on. After that, the method figures out the average time a user spends on each state, and the average battery consumption per state. Finally, the method computes the battery level by feeding that information to the following formula:
where
T is the number of minutes ahead for which the battery level is to be predicted,
is the average number of minutes spent on state
i and
is the average battery consumption per minute.
Regarding the test setup, both models are trained using the first 10% samples of a user’s trace data. Then, battery predictions were made for each of the following days in the data-set. Particularly, we picked an hour of the day and run each model to estimate the battery level for the next hours. After that, the mean squared error (MSE) was calculated for each curve and averaged to get the MSE per day for each model. The selected hour of the day was 12 p.m. because it is the time of the day on which the mobile users have more activity, and it is when dew schedulers might take more advantage of the model because of the consequent abundance of active mobile devices. Besides, making a prediction of five hours ahead is a good baseline for comparing both approaches, since [
19]’s model does not aim to return good estimations in the long run. This can be quickly visualized in
Figure 7, where estimations using both models for our sample user are depicted.
The final result of this process can be found in
Table 4 for 23 users from the device analyzer data-set. The columns show the average MSE per week day at 5 p.m. for both methods. In bold is highlighted the method that achieved the minimum MSE in the experimental scenario. The latter is defined as a prediction made on a week day at 12 p.m. five hours ahead. Refer to
Table A1 in the
Appendix A for information of the amount of samples involved in the computation of each MSE value. It is clear that our proposed method achieved the lowest MSE in 136 out of 161 prediction scenarios, considering an instance in the latter the prediction was made for a weekday-hour combination. This outcome is due to the fact that the Kang et al. method is based only in the average consumption, and it does not take into account time-related factors, such as when the battery is going to be charged. Our proposed method, instead, not only takes into account previous states by learning its behaviour with a regression model, but also predicts whether the mobile phone is going to be connected to the energy supply or not in the future. That feature helps the model to figure out if the series is going up or down at a specific time. Besides, by predicting whether the screen is going to be on, the model can adapt the fastness at which the battery level decreases or increases (depending on the energy supply).
The complementary models make our approach more precise for longer periods of times.
Figure 7 shows the predicted battery level for a period of 36 h using the proposed approach, i.e., from a given starting point it iteratively predicted, state by state, the final battery level after 48 h. It is clear that there is a significant visual resemblance between the real battery level and the estimated one. Moreover, the energy supply model had a very high influence in predicting the hour when the battery level was going to increase. For that particular case, Kang et al.’s MSE was 57.67, while our approach obtained 27.86.