*Registers* = *LocalRegisters* ∪ *ParameterRegisters*

#### *3.2. Method Invocation*

The DVM conforms to the ARM's calling convention which is used for low-level code where parameters, return values, return addresses and scope links are placed in registers. It dictates how these elements are shared between the caller and the callee. In fact, these two share a part of their register array so that the caller passes arguments to the callee by setting its parameter registers in the right order. As for class methods, a lookup procedure starts by searching in the list of all static methods that belong to the named class, where classes have distinct names and locating the invoked method

through its signature (i.e., name, argument types and number, and return type). Then, its parameter registers array is set according to ARM's calling convention, so that the first argument leads to the first parameter register *p*<sup>1</sup> and so on until the last argument which identifies the last register for arguments (*n* arguments lead to *n* parameter registers).

In the dynamic invocation case, the class of the object whose method is being called (or recipient object's class) is statically unknown, so it is first retrieved from the heap through its reference (see the semantics section for more details). Then, a lookup procedure searches among the class method list upwards to its super-class chain, for a method matching the given method signature. Registers comprise an additional register for the object reference called *p*<sup>0</sup> in Smali code. Hence, the actual number of parameter registers is *p* + 1.

Local register contents are initially undefined (registers are untyped in Dalvik), however, its number is statically known.

#### *3.3. Types in Smali*

Smali code has two major classes of types, primitive types and reference types.

A primitive notation in Smali is particular where a single letter specifies each type, for example *V* is used for a void type.

Reference types are objects (i.e., class type) and arrays. A class type takes the form *Lpackagename/ClassName;* where the leading *L* indicates that it is a class type, *packagename* is the package name path where class *ClassName* belongs to, whereas *ClassName* refers to the class name. For example, a thread object in Smali has the following type: *LJava/lang/Thread;* which is equivalent to *Java.lang.Thread* in Java. Arrays take the form [ *Type* ( *Type* which could obviously be a primitive or a reference). Arrays with multiple dimensions are presented by corresponding number of "[" characters. For example, a two-dimension arrays of int(s) is presented as follow [[*I* which is equivalent to *int*[][] in Java. Table 1 summarizes different types in Smali.



#### **4. Operational Semantics for a Single-Threaded Application**

#### *4.1. Notations*

Throughout the paper, we use the following notations:


#### *4.2. Syntax*

Table 2 provides basic syntactic categories as well as the selected instructions syntax.

A package of a disassembled DEX bytecode format is specified by a name *pck* and sequences of classes. In our formal model, we consider that a package consists only of classes that correspond to *.Smali* files (Androidmanifest file and the rest of XML files are not considered in our formalization).

**Table 2.** Smali+: sequential execution.


A class *Cl* definition includes its access flags *Acc-flg*, which is a keyword defining the class visibility, a fully qualified class name *Cfn* that indicates the class package path name followed by the class name *c* (we assume an unlimited supply of distinct names). This includes also its direct super-class fully qualified name (a single inheritance). is applied to classes without super-classes such as the Object class and the Thread class, and finally a set of implemented interfaces *Intf* , fields *Fld* and methods *Mtd*.

An interface is specified by its fully qualified name *Inf* , access flags *Acc-flg*, a set of super-interfaces *Sinf* , its abstract methods (which consist of their method signatures) and constant fields. A field definition comprised its name *f* , its access flags and a type *τ* (which could be a primitive for static fields or a class type for instance fields ). A method definition includes a set of access flags that determines its scope, the method signature, the number of local registers it operates on denoted by *loc* and a sequence of labeled instructions *Inst* that present the method body. A method signature consists of the method name *m*, argument(s) type *τ* and a return type *retτ* which might be a void, primitive or a class type. In Smali+, we consider a subset of Dalvik instructions being selected based on results of a study of 1700 Android applications, carried out to determine what instructions and language features are most often used in typical applications [16,17]. In fact, Dalvik bytecode comprises 218 instructions [39]. We bring some modification to the selected instructions that does not affect the expressive power of Dalvik language. In contrast, it simplifies the representation of our semantics. For example, in Dalvik we find 13 variants of the *move* instruction that are semantically similar, we model this group of instructions by only one *move* instruction.

In our formal model, we consider instructions expressing the unconditional and conditional jump with, respectively, *goto* and *if* -<sup>&</sup>lt; instructions. A *move* instruction to move values from source *Src* to destination *Des*. A destination may be a register name *v*, an instance field *vref* . *f* or a static field *Cfn*. *f* , whereas a source *Src* may be any of these elements beside constants *cst*. We consider also instructions expressing the creation of a new object of a class *Cfn*, a return from a void and non-void method with *new-instance*, *return-void* and *return* instructions, respectively. Method invocation refers to the method name, argument types and number, return type and registers. For methods class that are dynamically dispatched, it includes in addition to that a register holding the recipient object reference.

#### *4.3. Semantics*

Table 3 defines the domains used by our operational semantics. In fact, each application has at least one thread that defines the code path of execution and all of the code will be processed along the same code path if there is no other created thread. Hereafter, we suppose a single-threaded execution, a simple programming model with deterministic execution order, which means that an instruction has to wait for all preceding instructions to finish prior to being processed. We model such execution with a local configuration denoted by *σ*. It models the full state of a single-threaded program. It includes a call stack *Cs*, a heap *H* and a static heap *S*. A call stack allows keeping track of all information concerning methods invoked in the program. It is initially empty and presented as a sequence of method frames. A method frame *Fm* is a triplet consisting of a method name *m*, a program counter *i* for execution progress, both determine the program point in the invoked method and finally a register array *R* mapping register names (parameters, locals and return) to values. We adopt the same notations for registers used in Smali, as explained in the Registers subsection. Therefore, we have a set of registers for the method parameters and a set for the method local variables. Local registers content are initially undefined denoted by ⊥. The top of the call stack represents the currently executing method's frame. Values can be either primitives or heap locations. A heap *H* map locations (we suppose an arbitrary number of unique locations) to objects *Obj* or arrays *Arr*. Objects record their class and a mapping from (class) fields to values, whereas arrays record the array type and its values. Finally, the static heap *S* is a mapping from static (class) field names to their values. Fields are annotated with their type used for initialization, to determine the default values of each primitive type (see Table 4). This annotation is omitted when it is unneeded.

The relation *<sup>σ</sup> <sup>m</sup>*(*i*) −−→ *<sup>σ</sup>* models evolution of a starting configuration *<sup>σ</sup>* into a new *<sup>σ</sup>* as the result of a computation step. *m*(*i*) represents the program point, which corresponds to the instruction at a position *i* in a specified method *m*, always for the top-most method frame of the call stack in *σ*.

To illustrate the semantics, we present in Table 5 the semantic rules for instructions presented in Table 2.


**Table 3.** Semantic domains.

**Table 4.** Default values of primitive types.




These rules are as follows. The rule *Rgoto* updates the program counter to the specified one unconditionally. Rules related to a *move* instruction from source to destination use an evaluation function 〚-〛 that evaluates a destination or a source under the current configuration *σ*, except for registers. In this case, for the sake of being simple, we use directly *R*(*v*) always from the top-most method frame of the call stack in *σ* since 〚*v*〛 is equivalent to *R*(*v*). Constants are evaluated to themselves whereas static and instance fields are evaluated based on static *S* and dynamic *H* heaps, respectively, obviously under the current configuration *σ*. The rule *Rmv-reg* evaluates the source sub-expression and then updates the destination register content in the register array. Rules *Rmv-insf* and *Rmv-Sf* update instance and static field, respectively, by the content of the source register. Rule *Rmv-cst* is quite straightforward. That is, after evaluating the source to constant, it updates the destination register content by the constant value.

Rule *Rnew-ins* creates a new object in the heap by reserving a memory with a new fresh location *l*, loading the class that is instantiated from and initialing its static fields, each by its default value according to Table 4. Once created, it returns the newly allocated object by pushing its heap location in a destination register *v*.

Rules *Rb-op* and *Ru-op* compute a binary or unary expression, respectively, and store the results in the destination register. Rules *Rif-true* and *Rif-false* models conditional jump. If the guard is evaluated to true, it branches to the targeted program counter (*Rif-true*), otherwise the program counter is advanced to the next instruction (*Rif-false*). In rules *Rinv-st* and *Rinv-dy*, a lookup function is called to look up for the appropriate method. In the dynamic case, the method class is retrieved from the heap through object location *l* which is passed to the register *vref* . In both rules, a new method frame structure is pushed on the top of the call stack. It includes the method name, a count program set to 0 and a register array *R* set as explained in the subsection Method invocation. Notice that here we increment the program counter of the caller by one to restart from the correct instruction once the callee returns.

A lookup method searches for a method matching the given method signature (*m*(*τ*1, ...*τn*) *loc* −→ *τ*) in the given class full name and upwards to its super-class chain. Once located, it returns the method signature with the number of its local registers. We assume that the identified class and method exist in the package and class ancestry, respectively, with an array of local registers. Moreover, we admit that all verification checks are performed by the DVM. For instance it is verified that the method can be legally accessed by the class. Thus, the invoke instructions *Rinv-st* and *Rinv-dy* are safe to execute.

$$\text{lookup}(\text{MtdSign}.\text{Cfn}) = \begin{cases} \begin{array}{l} m(\text{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\text{?}{\text{?}{\text{?}{?}}}}}}}}}}}}}\\}\text{}}\text{\tiny{\text{\tiny{\text{?}}}}\text{fc}}\text{ }\text{fc}\end{array}) & \text{if}\begin{array}{l} m\nmid m\in\text{Cfn}\end{array}\end{cases}$$

Rules *Rret-nv* and *Rret-v* pop the top frame from the call stack and pass on the return value from the callee back to the caller through its return register *ret*. Notice that, in the case of a void method, the return value must be moved to *ret* by the callee before the *return-void* instruction.

#### **5. Operational Semantics for a Multi-Threaded Program**

Results shown in [17] have highlighted multi-threading as a widely used feature in Android applications with 90.18% including a reference to Java/lang/Thread and 88% using monitors. An important rate that motivates us to take this feature into account in our formalization in order to develop a complete semantic.

#### *5.1. Syntax*

Here, we consider multi-threaded programs. Multi-threading semantics include single-threaded semantics for each running thread separately. Threads in the same DVM interact and synchronize using shared objects and monitors associated with these objects. In order to give a full account of Java concurrency, we consider instructions related to this aspect. We define macro-instructions that cover methods of the Java Thread API [40] which are *start* for thread spawning and *join* for joining a

referenced thread. We also define macro-instructions that cover several methods of the Java Object API [41] related to thread signaling such as *notify*, *notifyAll* and to synchronization such as *wait*. We also give the semantics of Dalvik instructions related to threads synchronization and monitors with the instructions *monitor-enter* and *monitor-exit*. All instructions syntax are illustrated in Table 6.


## *5.2. Semantics*

An overall configuration Σ =< *Cs*, *Srbl*, *H*, *S* > models the full state of an Android application in its low-level implementation. It presents a multi-threading program configuration including as first attribute a running thread's call stack *Cs*, a set of runnable threads *Srbl*, a heap *H* and a static heap *S*.


A new semantic domain for multi-threaded program is provided in Table 7. Some changes are applied to the object definition. It includes a new fields *acq* which indicates if the object's monitor is acquired by another thread. If this is the case, *acq* will contain this thread's reference, otherwise it will contain an undefined value ⊥ since an object cannot be reserved by more than one thread at once, at a given time. *Sblck* is a set of blocked threads waiting for the object's monitor to be released. *Swait* is a set of threads pending notification (threads that executed the *wait* instruction). The initial state of a new instance object, in a multi-threading context, will be initialized as seen in the single-threaded environment (with default values). New attributes are initialized as follows:


A class *Cl* is a Thread class if and only if it is an instance of a Thread class (⊥=Thread), which means that its super class *Sc* is either the Thread top class path (*Cfn* = LJava/lang/Thread) or another class that it is extended from this class. Each thread object has a Boolean finished field indicating whether the thread has completed its execution or not, a mapping from a group of threads to a set of threads call stacks, it contains a set of threads waiting to join this thread and an attribute called *state* indicating the current state of the thread. Each thread has a run method. Thread attributes are initialized as follows:




**Table 7.** Semantic domains for a multi-threaded program.

Table 8 provides the semantics of spawning and scheduling threads. Rule *Rstart* starts a new thread, which reference is stored in the register *vref* . It internally calls the referenced thread's *run*() method that will be executed in this thread separately, once selected. Therefore, a *lookup*() procedure for its run method is performed and a separate call stack for a new thread is created with one frame comprising all information about the thread's *run*() method returned by the *lookup*() function. This thread moves to a "runnable" state in *Srbl*. When it gets a chance to execute, its target run() method will be executed. The actual execution of the launched thread will be managed with the rule *Rselect*. Notice that, as expressed by the rule *Rstart*, the reference of the launched thread is always stored in the register *p*<sup>0</sup> and we assume that it will remain there for all semantics rules and for all method's frames in the thread's call stack.

**Table 8.** Multi-threaded semantics: scheduling.


Rules *Rselect* and *Rstop* manage threads scheduling. Rule *Rselect* selects from *Srbl* one thread to be executed for a time slice *ts*. The selected thread's state will be updated to a "Running(*ts*)" state. The thread's call stack will be removed from the runnable set and placed at the first position of configuration Σ to start execution. The *select*(*Srbl*) function will be based on a CFS scheduler's algorithm for scheduling threads in *Srbl*. It takes into account the thread's nice values and returns the

selected thread's local state presented in its current call stack as well as the time slice allocated to it for execution.

Rule *Rstop* stops, in a monitoring mode (i.e., a mode that monitors the execution time given to each thread), a thread whose allocated time slice to execute a task has expired. We model the timing aspect in our formalism by the function *clock*() which represents the scheduler timer to control running threads.

Synchronization in *Dalvik* is modeled by the use of monitors with instructions *monitor-enter* and *monitor-exit*. That actually corresponds to the *synchronized* keyword in Java. A monitor is attached to an object and could be acquired and released by threads.

The semantics of these two instructions must fulfill two conditions. The first is related to the mutual exclusive access to shared objects in the heap by different threads. The second relates to the cooperation between these threads. Cooperation is modeled by a set of threads waiting for notification when the object is released by another thread. The sole thread running and owning the monitor is in a critical section. Table 9 presents rules related to synchronization. *Monitor-enter* semantics represent a thread trying to access the critical section by acquiring monitor for the object, whose reference is stored in a register *vref* . It first checks if the object is acquired by any other thread. If this is the case, the current thread will be blocked (mutual exclusive access condition) and added to the object blocking set *Sblck* to join other threads (if any) with the same situation (cooperation condition). This case is modeled by the rule *Rblock*). Otherwise, the current thread can take ownership of the monitor. The *acq* attribute is then updated with this thread's reference. This thread could resume its execution in the critical section. This case is modeled with the rule *Racq*−*mnt*.



*Monitor-exit* semantics represents a thread that reaches the end of the critical section by releasing the owned monitor for another thread to take ownership, which perfectly fulfills the cooperation condition. Rule *RRls*−*mntr* provides this semantics, the current thread must first own this object's monitor, once this condition is satisfied, the *acq* attribute is updated to an undefined value (object is free). Then, all waiting threads in *Sblck* are removed to the runnable set *Srbl*. It is up to the scheduler to select which thread to execute (there is no ordering among the blocked threads).

A thread could voluntarily give up ownership of the monitor before reaching the end of the critical section by calling the *wait*() method or by executing the *wait* instruction. This thread releases ownership of this monitor and remains in a waiting state (i.e., suspended or inactive until be notified by another thread). Rule *Rwait* provides the semantics of wait instruction. The calling thread must own this object's monitor (i.e., must executing *wait* from inside a synchronized block) then relinquish it. Once the monitor associated with this object is released, the current thread is placed in the wait set for this object.

Table 10 presents rules *Rnoti f y* and *Rnoti f yAll* expressing the signaling mechanism. Rule *Rnoti f y* represents the semantics for waking up a single thread that is waiting for this object's monitor in the waiting set *Swait*. One thread among the set will be chosen randomly by the function *random*(). This thread will be moved from the waiting set to the runnable set to be selected later on by the scheduler and then processed. The rule *Rnoti f yAll* is similar to the rule *Rnoti f y*, with the exception that it wakes all threads in the waiting set, which ones will be moved to the runnable set *Srbl*. Notice that, rules *Rnoti f y* and *Rnoti f yAll* release in addition to waiting thread(s) set *Swait* all blocked threads in *Sblck*. The two sets have the same privileges with regards to acquiring monitor. In other words, waiting threads have no precedence over potentially blocked threads that also want to synchronize on this object.



Table 11 presents semantics of finishing thread and joining instructions. Rules *RJoin-exec* and *RJoin-wait* check if the joined thread has finished its execution, if so, the current thread resumes execution (*RJoin-exec*). Otherwise, the rule *RJoin-wait* is applied. The current running thread is removed into *Sjoin* for threads waiting for the same thread to complete its execution (no release by the monitor of the object is acquired by the running thread here). The rule *Rfinish* ensures that when a thread completes its execution (i.e., its run() method returns) and releases all waiting threads in *Sjoin* by moving them to the runnable set *Srbl*.



#### **6. Practical Aspects**

We give, hereafter, some practical aspects of Smali+ through an example. For the sake of simplicity and due to the space limitation, we only present an illustration of a single-threaded program in Smali<sup>+</sup> that includes various important instructions such as method call, return, static and instance field

update, etc. As shown in Table 12, the program is sequential and consists of two classes *c*1 and *c*2 belonging to the same package called *p*. Figure 3 shows the initial configuration. We show in detail, through this example, how the rules are applied and how the configuration evolves in every step. Each rule is followed by the resulting configuration.

**Cs** < *m*1, 5, *R* > **H S** Lp/c2 public x 0 public y '\*u*0000' Lp/c1 public a *null* private b 0 pv/final c '\*u*0000' **R** *v*<sup>0</sup> *v*<sup>1</sup> *v*<sup>2</sup> *ret* 5 ⊥⊥ ⊥ **R'** *v*<sup>0</sup> *v*<sup>1</sup> *ret* ⊥ ⊥⊥

**Figure 3.** Initial configuration.

The first table corresponds to the call stack *Cs*, which is the current method frame. The second table corresponds to an empty heap *H* and the last two tables correspond to the register arrays for methods *m*<sup>1</sup> and *m*2, respectively.

The first Smali<sup>+</sup> instruction to execute is the move instruction labeled with 5. It is a constant displacement, so the rule *Rmv-cst* applies. Since constants are evaluated to themselves, the register *v*<sup>1</sup> for *m*<sup>1</sup> locals registers is updated by the constant value and the program counter is incremented.

The next instruction corresponds to the unconditional jump *goto*. The rule *Rgoto* so applies to update the program counter by the instruction labeled with 10.

$$R\_{\text{geto}} \xrightarrow[]{} \begin{array}{c} m\_1(\text{6}) = \text{goto } 10 \\ < \text{} \times m\_1, \text{6}, R > \therefore \text{C}\_{\text{s}}, H\_{\text{s}} > \xrightarrow{m\_1(\text{6})} < < m\_1, 10, R > \therefore \text{C}\_{\text{s}}, H\_{\text{s}} > 10 \end{array}$$

*m*1(10) is an invocation of a static method. Rule *Rinv-st* so applies. A new frame for the called method is pushed on top of *Cs* and the counter program in the caller method frame is incremented.


*m*1(10) =**invoke-static** *Lp*/*c*1 *m*2(*int*, *char*)*char v*0, *v*<sup>1</sup> *lookup*(*m*2(*int*, *char*)*char*, *Lp*/*c*1) = *m*2(*int*, *char*)*char* 2

*R* = {*v*<sup>0</sup> → ⊥, *v*<sup>1</sup> → ⊥, *p*<sup>1</sup> → *R*(*v*0), *p*<sup>2</sup> → *R*(*v*1)}

*Rinv-st* <<*m*1,10,*R*1>::*Cs*,*H*,*S*<sup>&</sup>gt; *<sup>m</sup>*1(10) −−−−→<<*m*2,0,*<sup>R</sup>* <sup>&</sup>gt;::<*m*1,**11**,*R*>::*Cs*,*H*,*S*<sup>&</sup>gt;


After some execution steps, we suppose that the register *v*<sup>1</sup> in *m*<sup>2</sup> is updated by a new value "CA" and the current instruction to execute is labeled with 18 in *m*2.


The instruction *m*2(18) is a return from a non-void method *m*2, so the rule *Rret-nv* applies. The top frame of *Cs* is popped and the return value is passed from the callee back to the caller through its return register *ret*.


The instruction *m*1(11) is a static field update. So the rule *Rmv-sttf* so applies to update the indicated field in the static heap *S* by the register *v*<sup>0</sup> content.


The instruction *m*1(12) corresponds to an object creation. The rule *Rnew-instance* so applies to create a new instance from the class *c*1 in the heap *H* and all fields are initialized according to their types.

$$m\_1(12) = \mathbf{new} \cdot \mathbf{in} \mathbf{stance} \cdot \mathbf{v}\_2 \cdot Lp / c1$$

$$R\_{\text{non-im}} \xrightarrow{o'=\{ |Lp/c1| \colon (a \mapsto \text{null}, b \mapsto \langle \mu 0000', c \mapsto \langle \mu 0000' \rangle \} \quad l' \notin dom(H)}$$

$$\langle  \text{::C}\_s, H\_rS > \xrightarrow{m\_1(12)} <  \text{::C}\_s, H [l' \mapsto o'], S > \text{'} $$

The instruction *m*1(13) is an instance field update. So the rule *Rmv-instf* applies. The register *v*<sup>2</sup> holds the instance location *o* in *H*. The instance field in *o* is updated with the source register *v*<sup>1</sup> content.

#### **7. Discussion**

So far, we have proposed a formal language for Android programs called Smali+. Presented in a BNF notation, *Smali*<sup>+</sup> is a simple language that remains faithful to the original Smali notations and the .Smali file structure. It contains 12 generalized instructions from 218 Dalvik instructions [39] and some macros instructions modeling concurrency aspect. These 12 instructions were selected carefully to highlight Dalvik's characteristics, such as register-based architecture, assembly-like code for Smali, methods invocations, monitors, etc. Macro instructions were used for the sake of simplification as well as to model multi-threading in Android. All the important API methods that affect a thread life-cycle were considered in Smali<sup>+</sup> semantics.

Another important feature that lacks so far in Android application semantics is thread scheduling. This important aspect, in general, consists in picking a thread for execution and allocating an execution time to it, depending on its priority, before selecting a new thread to execute and switching the context. Android applications including their threads adhere to the Linux execution environment. So, threads are scheduled using the standard scheduler of the Linux kernel, known as a *completely fair scheduler* (CFS). On Linux, the thread priority is called a "nice value". A low nice value corresponds to a high priority and vice versa. In Android, a Linux thread has niceness values in the range of −20 (most prioritized) to 19 (least prioritized), with a default niceness of 0 [42]. We exhibited in this work two rules related to scheduling feature in Android, *Rselect* and *Rstop*. In the first rule, we presented a function *select*() that plays the same role as the CFS, meaning it selects from runnable threads the most prioritized thread based on nice values comparison and allocates to it an amount of time for execution. The second rule stops a thread when the allocated time expires, prior to picking a new one through *Rselect*. We mean by "monitoring mode" mentioned in threads scheduling, a monitor that is based on the CFS algorithm that monitors each thread for each task executed, and we suppose that each rule in the concurrent context is executing under a monitoring mode. This mode was presented just for *Rstop* and omitted in other rules for simplification reasons.

The operational semantics are mainly created to secure Android applications. In fact, we intend to use these semantics in an upcoming work to check a number of security proprieties to protect users from rogue applications. Our ultimate goal is to formally reinforce security policies on Android applications. That is to say, starting from a Smali<sup>+</sup> program and a formal specification of a security policy, we automatically generate a new equivalent secure version of the original program that respects the security policy. Formally, the approach takes, as input, a Smali<sup>+</sup> program *P* and a formal specification of a security policy *φ* and generates, as output, a new version *P* that respects *φ*. The new version of the program preserves all the behavior of the original version, except in cases where the security policy is on the verge of being violated. This is equivalent to saying that the traces of *P* are the intersection of traces accepted by *φ* and traces of *P*. It is formally modeled by (1).

$$P' = P \cap \phi \tag{1}$$

Security policies will be enforced through a program-rewriting approach that combines static and dynamic approaches. It rewrites the program statically, according to a given security property, then generates a new executable version that satisfies this property. Security modifications or tests are added at well-calculated points in the program to force the latter to conform to the security property during execution. In other words, the untrusted code will be transformed into a self-monitoring code that will be exploded at specific points in the program. The rewritten version should be equivalent but more restrictive than the original so that it will be able to avoid potentially dangerous operations before they occur.

Reinforced security properties will obviously be specific to malware and attacks threatening Android applications, such as sensitive information leakage, which could be SMS contents, call logs, contact information or geographical location or Android financial malware, which exploit the premium services to incur financial loss to the user for the benefit of the attacker, for example, by calling or texting to premium-rate numbers without the user's consent and privilege escalation attacks [43]. Therefore, all mediums that could be exploited for this kind of malware, such as Internet access, system services access including SMS, contact, telephony, Bluetooth, Global Positioning System (GPS) as well as APIs resulted from inter-application communication, will be checked through security policies. Such APIs will be easily located in Smali+, since it provides for each invocation the class fully qualified name.

#### **8. Conclusions**

In this paper, we have proposed a formal operational semantics for Smali, an assembly-like code generated form reverse engineering Android applications. We called the new formal language Smali+. Smali<sup>+</sup> covers the semantics of a large subset of the main Dalvik instructions as well as many important aspects related to multi-threading programming which are rarely considered in the state-of-the-art works of Android applications. This formal model is meant to be an environment to run formal verification of applications. Broader work consisting in techniques to reinforce the security of Android applications using this formalism is currently underway. We are deeply convinced that this will be of great help in analyzing the security of Android applications and verifying their hidden functions affecting users' privacy as well as protecting users from malicious actions.

**Author Contributions:** Conceptualization, M.Z., J.F. and M.M.; methodology, M.Z., M.M. and J.F; validation, J.F., M.M. and E.P.; formal analysis, M.Z., M.M. and J.F.; investigation, M.Z., J.F., M.M. and E.P.; resources, M.Z., J.F., M.M. and E.P.; writing–original draft preparation, M.Z. and J.F.; writing–review and editing, M.Z. and J.F.; supervision, M.M., J.F. and E.P.; project administration, M.M.; funding acquisition, M.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
