**1. Introduction**

The JPEG2000 (J2K) standard [1] is a still-image codec which also encompasses the compression of sequences of images that goes by the name Motion J2K (MJ2K). The standard relies on the J2K Interactive Protocol (JPIP) [2] to transmit J2K codestreams between client/server systems, offering a high degree of scalability (spatial, temporal and quality). These features make J2K (and its extension MJ2K) especially suitable for the management of video repositories, and for the implementation of interactive image/video streaming services [3]. In particular, JPIP has proven very effective for visualization of petabyte-scale image data of the Sun (Helioviewer Project [4,5]), allowing researchers and the general public alike to explore time-dependent image data from different space-borne observatories, interactively zoom into areas of interest and play sequences of high-resolution images at various cadences. Figure 1 shows an example of an interaction with the JHelioviewer service.

**Figure 1.** Different instants of a remote browsing (on a 2048 × 2048 pixels display) of a 4096 × 4096 pixels image sequence of extreme ultraviolet images of the Sun, taken by NASA's Solar Dynamics Observatory (SDO), using the JHelioviewer application. Initially, users retrieve the sequence of images with a resolution that fits in their display (left subfigure). Notice that the information of the highest spatial resolution level (4096 × 4096) is not transmitted because can not be displayed. In any moment of the transmission users can select a Window Of Interest (WOI), which in this example starts at pixel 1000 × 500 and have a (image) resolution of 1024 × 1024 (center subfigure), retrieving this WOI at the highest resolution. In the rest of the transmission (right subfigure), only the code-stream related to that WOI will be transmitted. Image credit: NASA/SDO/AIA.

MCJ2K (Motion-Compensated JPEG2000) is a combination of two fundamental stages: (1) MCTF (Motion-Compensated Temporal Filtering) and (2) J2K. Basically, MCTF is a transform that inputs a sequence of images and outputs a sequence of *MCTF-coefficients* (which will simply be called *coefs*), grouped in a collection of temporal sub-bands. Then, these coefs are compressed with J2K, resulting in a collection of J2K codestreams that can be transmitted using JPIP. The R/D performance of MCJ2K can clearly be better than that of J2K, depending on the temporal correlation among the input images. As an example, Figure 2 shows a Sun image (of a sequence) decompressed with MJ2K and MCJ2K, at similar bitrates.

**Figure 2.** Reconstruction of one image of a sequence of Sun images using MJ2K and MCJ2K. Left: original Sun image with 512 <sup>×</sup> 512 pixels and a cadence of <sup>1</sup> <sup>12</sup> images/second. Center: same image (progressively) decompressed with MJ2K at 0.08 kbps. Right: same image (progressively) decompressed with MCJ2K at 0.04 kbps. Image credit: NASA/SDO/AIA.

MCJ2K is a straightforward extension of MJ2K, and it has been proposed previously [6]. However, the adaptation of MCJ2K to standard JPIP services, such a Helioviewer, is a novel contribution. Furthermore, two novel RA (Rate-Allocation; in this document this term refers to the action of sorting the code-stream to provide some kind of scalability, and the term "rate control" is used to decide which information is represented by the code-stream in rate-constrained scenarios)

algorithms: OSLA (Optimized Sub-band Layers Allocation) and ESLA (Estimated-slope Sub-band Layers Allocation) are herein proposed and experimentally evaluated. Both algorithms are run at post-compression time to determine an efficient progression of quality layers.

The rest of this document is structured as follows. Section 2 describes the most relevant works related to MCJ2K and RA in wavelet-based video coding. MCJ2K, OSLA and ESLA are detailed in Section 3. Section 4 presents the results of an empirical performance evaluation, and Section 5 summarizes our findings and outlines future research in Section 6.

### **2. Background and Related Work**

The combination of MCTF and J2K has been proposed in previous works. Secker et al. use these techniques to create LIMAT [7], but no RC (Rate Control) or RA algorithms are proposed. The motion information is simply placed first, followed by the texture data.

In [6], Cohen et al. propose two ME (motion estimation)-based J2K codecs. The first is a 2D-pyramid codec with an MCTF step on each spatial level, and a closed-loop coding structure, similar to H.264/SVC [8] and HEVC [9]. The second codec is open-loop, similar to MCJ2K, but the authors do not address the problem of RA among temporal texture sub-bands and motion information.

A similar approach to [7] was designed in [10] and extended in [11] by André et al. Also Ferroukhi et al. [12] have recently proposed a similar codec based on second generation J2K. In these works, using RDO (Rate-Distortion Optimization) [13], an RC algorithm is proposed to determine the contribution of each temporal sub-band. None of these works provide an RA algorithm.

In [14] Barbarien et al. provide some interesting ideas to perform optimal RC at compression time. Before using a 2D-DWT (Discrete Wavelet Transform) [15], all the residue coefficients resulting from the MCTF stage are multiplied by a scaling factor to approximate MCTF to a unitary (energy preserving) transformation. As in [11], an optimal RC among motion and texture data is proposed using RDO. An interesting alternative was proposed in [3], where, similar to the use of quantization in hybrid video coding, a set of R/D slopes can be specified to control the composition of each quality layer (by including those layers whose R/D slopes are higher than or equal to the slope of the corresponding layer). These approaches are optimal only in linear transformation scenarios, a condition which is difficult to satisfy (as will be shown in the experimental results) when ME/MC techniques are used. The compatibility with JPIP is not studied in these works.

As previously mentioned, RA can be performed at decompression time. However, in this case, it can be implemented by the sender (server), receivers (clients) or both. FAST, proposed by Aulí-Llinàs et al. in [16] and improved by Jimenez-Rodriguez et al. in [17], is a sender-driven RA algorithm for MJ2K sequences. Another interesting MJ2K/MCJ2K sender-driven RA proposal was introduced by Naman et al., which uses Conditional Replenishment (CR) [18] and Motion Compensated (MC) [19]. In Naman's proposals, a server sends those J2K packets related to the regions that the clients should refresh to optimize the quality of the video after considering bandwidth constraints. These proposals are not fully J2K compliant at the server side (a requirement in standard JPIP services) because some kind of non-J2K-standard logic must be used.

Receiver-driven RA solutions have also been proposed in previous studies. For example, in DASH [20], clients retrieve video streams, requesting (GOP by GOP) those code-stream segments that maximize the user's QoE (Quality of Experience), and the buffer fullness. In [21], Mehrotra et al. propose an improvement of the previous approach in which clients use the R/D information of the video to select (taking into account the desired startup latency, the buffer size, and the estimated network capacity) the optimal number of quality layers (in the case of H.264/SVC), or which quality-version of each GOP (in the case of H.264/simulcasting) will be transmitted.

As in [11], in [14] the authors also propose an optimal RA among motion and texture data based on Lagrangian RDO, considering that the distortions are additive (something that can be sub-optimal in those cases where the MCTF is not linear). Such optimization minimizes the distortion for a known bitrate, but not for any possible bitrate (note that when transmitting an image or a sequence of images, such bitrates established at compression time might not be met at decompression time).

### **3. MCJ2K**

### *3.1. Codec Overview*

MCJ2K is a two stages codec (see Figure 3): MCTF performs temporal filtering and MCJ2K compress the sequence of sub-bands. The resulting code-stream (see Figure 4) is a collection of compressed texture (each one composed by coefs) and motion sub-bands. MCJ2K is an open-loop "t+2D" structure. The "t" corresponds to a *T*-levels MCTF (a *T*-levels 1/3 linear 1D-DWT, denoted by MCTF*T*) and the "2D" to a 2D-DWT, provided by the standard J2K codec. MCTF*<sup>T</sup>* exploits the temporal redundancy and 2D-DWT, included as a part of the MJ2K compressors, the spatial redundancy. The set of MJ2K compressors inputs the coefs of each temporal sub-band generated by MCTF*<sup>T</sup>* and perform entropy layered coding.

In Figure 3, *s* represents the original sequence and [*s*] *<sup>Q</sup>* a progressive approximation of *s*, reconstructed with MCJ2K using *Q* quality layers. MCTF*<sup>T</sup>* transforms *s* into a collection of *T* + 1 temporal texture sub-bands {*LT*, {*H<sup>t</sup>* ; 1 ≤ *<sup>t</sup>* ≤ *<sup>T</sup>*}}, and *<sup>T</sup>* motion-"sub-bands" {*M<sup>t</sup>* ; 1 ≤ *t* ≤ *T*}. In Figure 3, the number of levels of MCTF is *T* = 2.

Compared to the MPEG/ITU standards, all the coefs (in our case, the images of index *<sup>i</sup>* <sup>×</sup> <sup>2</sup>*T*; *<sup>i</sup>* <sup>=</sup> 0, 1, ··· of *<sup>s</sup>*) of *<sup>L</sup><sup>T</sup>* are I-type, and all the coefs of {*H<sup>t</sup>* ; 1 ≤ *t* ≤ *T*}) are B-type. More details about how MCTF has been implemented can be found in [22], and in our implementation published on GitHub (https://github.com/vicente-gonzalez-ruiz/MCTF-video-coding).

**Figure 3.** Codec architecture.

Figure 4 shows an example of the organization of an MCJ2K code-stream. Nine images have been compressed (although only the first six *<sup>s</sup>*0, ···*s*<sup>5</sup> have been shown) using a GOP size *<sup>G</sup>* <sup>=</sup> <sup>4</sup> (i.e., *T* = 2 (*G* = 2*<sup>T</sup>*)), except for the first GOP, which always has only one image. MCTF2 transforms (see Figure 3) the input sequence *<sup>s</sup>* into 3 texture sub-bands {*L*2, *<sup>H</sup>*2, *<sup>H</sup>*1} and 2 motion sub-bands {*M*2, *<sup>M</sup>*1}. *<sup>L</sup>*<sup>2</sup> is the low-frequency texture sub-band, and represents the low-frequency temporal components of *<sup>s</sup>*. {*H*2, *<sup>H</sup>*1} contains the high-frequency temporal components of *<sup>s</sup>*. {*M*2, *<sup>M</sup>*1} stores a description of the motion detected in *s*. In Figure 4, arrows over the motion fields indicate the decoding dependencies between the coefs. When the inverse transform is applied, a succession of increasing temporal resolution levels {*L*2, *<sup>L</sup>*1, *<sup>L</sup>*0} are generated. By definition, *<sup>L</sup>*<sup>0</sup> <sup>=</sup> *<sup>s</sup>*.

**Figure 4.** Example of a code-stream for MCTF2.

MCJ2K implements a full/sub-pixel telescopic-search [23] bidirectional block-matching ME algorithm [24]. The block size *B* is constant (inside a coef), and a search area of *A* pixels is configurable. Exhaustive and logarithmic searches [25] are available. ME/MC operations are performed at the maximum spatial resolution of the sequence. This design decision, which is convenient for a progressive-quality visualization of the full-resolution video, implies that the inverse motion compensation process must be performed at the maximum resolution to avoid a drift error [26], when a reduced resolution of the images is decoded. Obviously, in this case, the decoder increases the computing requirements, but this does not significantly increase the memory usage if all the blocks are not processed in parallel. As an advantage, the quality of the reconstructions is higher than for the case in which the ME/MC state is performed at a lower resolution because the motion information is always used with the accuracy used at the compression, which can be sub-pixel.

Motion data is temporally and spatially decorrelated, and lossless MJ2K-compressed as a sequence of 4-component (2 vectors per macro-block) single-layer (*Q* = 1) images. (Usually, the use of approximate motion information generates severe artifacts in the reconstructed images and increases the non-linearity of the codec. Therefore, only one quality layer and lossless coding was used for the motion sub-bands.) The decorrelation process uses an algorithm in which, when no motion data is received, the inverse MCTF process supposes that all the motion vectors are zero. Thus (in the transmission process), when the decoder knows *MT*, then it is supposed that the motion vectors of *MT*−<sup>1</sup> are half of the value of *MT*, and this linear prediction is used for the remaining temporal resolution levels [27].

### *3.2. Bitrate Control*

RC is performed at compression time. In the MJ2K stage, each coef of each texture sub-band is J2K-compressed, producing a layered variable-length code-stream (in the Figure 4, *Q* = 3 quality layers). Let *Si <sup>q</sup>* be the *q*-th quality layer of the compressed representation of the coef *Si* of sub-band *S*, and (*Si*) *<sup>q</sup>* the quality (i.e., the decrease in distortion) provided by *Si <sup>q</sup>* in the progressive reconstruction of *Si*. Assuming that the distortion metric is additive, we define

$$\left[\mathcal{S}\_{i}\right]^{q} = \sum\_{j=1}^{q} \left(\mathcal{S}\_{i}\right)^{j},\tag{1}$$

which is the quality of the reconstruction of the coef *Si* using *q* layers. (In this notation, the first quality layer, in the layers decoding order, has the index 1.) We define the *q*-th R/D slope of coef *Si* as

$$
\lambda\_{S\_i^q} = \frac{[S\_i]^q - [S\_i]^{q-1}}{l(S\_i^q)} = \frac{(S\_i)^q}{l(S\_i^q)} \,'\, \tag{2}
$$

where *<sup>l</sup>*(*Si <sup>q</sup>*) represents the length of *Si q*.

Owing to how the R/D slopes are chosen in the MJ2K stage, it holds that for any two different coefs *i* and *j* of sub-band *S*

$$
\lambda\_{S\_i^q} = \lambda\_{S\_j^q} \forall q \in \{1, \cdot, \cdot, Q\}. \tag{3}
$$

We define a *sub-band layer* (SL) *S<sup>q</sup>* (of motion (In the case of the motion, the definition is identical, but there is only one quality layer.) or texture) as the collection of quality layers

$$\mathcal{S}^{q} = \{ \mathcal{S}\_{i}^{q}, i = 0, \dots, 2^{T} - 1 \}. \tag{4}$$

For example, in Figure 4, SL *<sup>L</sup>*<sup>21</sup> <sup>=</sup> {*L*<sup>2</sup> 0 1 , *L*<sup>2</sup> 1 1 } and SL *<sup>M</sup>*1<sup>0</sup> <sup>=</sup> {*M*<sup>1</sup> 0 0 , *M*<sup>1</sup> 1 0 }.

Equation (3) has two implications: (1), in general, the total length of the code-stream of each coef will be different (depending on its content), and (2) the bitrate allocation is optimal for each sub-band layer [28].

The *q*-th R/D slope of SL *S<sup>q</sup>* is defined as

$$
\lambda\_{S^q} = \frac{[S]^q - [S]^{q-1}}{l(S^q)} = \frac{(S)^q}{l(S^q)},\tag{5}
$$

where *l*(*Sq*) represents the length of SL *Sq*, [*S*] *<sup>q</sup>* the quality of the GOP obtained after decompressing *q* layers, and (*S*)*<sup>q</sup>* the quality provided by the SL *Sq*.

### *3.3. Post-Compression R/D Allocation*

RA is typically performed at decompression time. In accordance with Part 9, Section C.4.10 of the J2K standard [2], JPIP clients can request J2K images by quality layers. Moreover, as previously shown in [29], it is also possible to perform JPIP request for a range of images. Therefore, by extension, the JPIP standard can also be used for retrieving complete sub-band layers using a single JPIP request. For example (see Figure 4), if *T* = 2, we decompose a sequence in 3 temporal sub-bands, and the sub-band layer *H*<sup>21</sup> has, for each GOP, only one coef *H*<sup>2</sup> <sup>0</sup> with two quality layers {*H*<sup>2</sup> 0 1 , *H*<sup>2</sup> 1 1 }, which would be that which is requested by a client to retrieve the sub-band layer *H*<sup>21</sup> .

It is easy to see that the SLs in a MCJ2K code-stream are

$$\begin{array}{ccccccccc}L^{T^1} & H^{T^1} & H^{T^1 - 1} & \cdots & H^{1^1} \\ L^{T^2} & H^{T^2} & H^{T - 2} & \cdots & H^{1^2} \\ \vdots & \vdots & \vdots & & \vdots \\ L^{T^Q} & H^{T^Q} & H^{T - 1} & \cdots & H^{1^Q} \\ M^{T} & M^{T - 1} & \cdots & M^1 \end{array} \tag{6}$$

and that there are *Q*(*T* + 1) SLs in this set, which is also the number of optimal truncation points of a MCJ2K code-stream.

At decompression time, the order in which the SLs are retrieved from the JPIP server should minimize the R/D curve, for any bitrate. For this task, we propose the following two approaches.

### 3.3.1. Optimized SL Allocation (OSLA)

Starting at *<sup>L</sup>T*<sup>1</sup> , the optimal order of the remaining SLs of a GOP can be determined by applying Equation (5) to each feasible SL, and sorting them by slope. Thus, after retrieving *<sup>L</sup>T*<sup>1</sup> (which always contributes to the quality of the reconstruction more than any other SL), several alternatives {*MT*, *<sup>M</sup>T*−1, ··· , *<sup>M</sup>*1, *<sup>H</sup>T*<sup>1</sup> , *<sup>H</sup>T*−1<sup>1</sup> , ··· , *<sup>H</sup>*1<sup>1</sup> } should be checked to determine the next SL with the highest possible contribution. Considering that *λM<sup>T</sup>* > *λM<sup>t</sup>* , ∀*t* ∈ {*T* − 1, ··· , 1}, for example, if *<sup>λ</sup>M<sup>T</sup>* <sup>&</sup>gt; *<sup>λ</sup>Ht*<sup>1</sup> , <sup>∀</sup>*<sup>t</sup>* ∈ {*T*, ··· , 1}, the following SL to decode should be *<sup>M</sup><sup>T</sup>* and the next set of alternatives would be {*MT*−1, *<sup>H</sup>T*<sup>1</sup> , *<sup>H</sup>T*−1<sup>1</sup> , ··· , *<sup>H</sup>*1<sup>1</sup> }. Otherwise, if for example, *<sup>λ</sup>HT*<sup>1</sup> > *<sup>λ</sup>M<sup>t</sup>* and *<sup>λ</sup>H<sup>T</sup>* <sup>&</sup>gt; *<sup>λ</sup>H<sup>t</sup>* , <sup>∀</sup>*<sup>t</sup>* <sup>=</sup> *<sup>T</sup>* <sup>−</sup> 1, ··· , 1, after *<sup>L</sup>T*<sup>1</sup> the next SL to decode should be *<sup>H</sup>T*<sup>1</sup> , and the current set of feasible SLs of would be {*MT*, *<sup>H</sup>T*<sup>2</sup> , *<sup>H</sup>T*−1<sup>1</sup> , ··· , *<sup>H</sup>*1<sup>1</sup> }. Notice also that other SLs could follow *<sup>L</sup>T*<sup>1</sup> , such as *LT*<sup>2</sup> .

This idea was implemented in the OSLA algorithm (see Algorithm 1). For each GOP, the input sequence of *Q*(*T* + 1) SLs is *S* sorted in descending order by their R/D slopes to reconstruct the GOP (see Equation (5)). The output list Λ of sorted-by-slope SLs can be stored in a COM segment of the header of the coef of *<sup>L</sup><sup>T</sup>* (the SL *<sup>L</sup>T*<sup>1</sup> is always the first in <sup>Λ</sup>). Next, JPIP clients retrieve the quality layers of each coef of the GOP in the order specified in Λ.


3.3.2. Estimated-slope SLs Allocation (ESLA)

Ignoring any possible effect of the non-linear behavior of the ME/MC stage, our implementation of MCTF approximates to a biorthogonal transform and, therefore, each sub-band {*LT*, *<sup>H</sup>T*, ··· , *<sup>H</sup>*1} contributes with a different amount of energy to the reconstruction of the sequence. This can easily be verified by comparing the energy that the different coefs of each temporal sub-band contribute to reconstruction of the sequence [30]. How much energy a coef must contribute to the code-stream to approximate MCTF to an orthonormal (energy preserving) transform is represented by attenuation values (see Table 1)

$$\alpha\_{H^t} = \frac{E(L^T)}{E(H^t)},\tag{7}$$

where *<sup>E</sup>*(·) represents the signal energy. These attenuations are empirical, specifically determined for the 1/3 ME-driven DWT implemented in our codec (for a different transform, other values would be obtained).


**Table 1.** L2-norm (energy) of the MCTF basis functions for the temporal sub-bands, expressed as attenuation values.

The ESLA algorithm incorporates these attenuations to scale the R/D slopes of each SL of each GOP, when these slopes have been determined taking into consideration only the reconstruction of the corresponding coef (not the reconstruction of the full GOP, as OSLA does (notice that for this reason, OSLA does not need to use such attenuations). Thus, for example, an R/D slope for a quality layer of a coef of the sub-band *H*<sup>3</sup> resulting from an MCTF<sup>5</sup> is divided by 4.061. In cases where there is more than one coef in a temporal sub-band, as in this example, the average of all the scaled slopes is used to determine the contribution of the corresponding SL.

This idea was implemented in ESLA (see Algorithm 2). As in OSLA, for each GOP, the input sequence of *Q*(*T* + 1) SLs *S* is sorted in descending order by their estimated R/D slope, but now the slopes of the SLs are computed directly as a weighted average of the R/D slopes of the quality layers of the corresponding coefs. If these slopes are predefined (the compression of the coefs uses the same slopes set of *Q* slopes for all the coefs), ESLA can be run at the receiver side without sending any R/D information. This means that the JPIP client can determine the order of SLs Λ for all the GOPs of the sequence after receiving only *T*, *Q*, and knowing the sub-band attenuations (Table 1), which does not depend on the sequence. For this reason, ESLA is more suitable than OSLA for real-time streaming scenarios.

### **Algorithm 2:** ESLA algorithm.

1. for each GOP: 2. Λ = []; *i* = 0 3. for each *q* ∈ {1, ··· , *Q*}: 4. <sup>Λ</sup>[*<sup>i</sup>* + +] = input{*λHT<sup>q</sup>* , ··· , *<sup>λ</sup>H*1*<sup>q</sup>* } 5. for each *λ<sup>k</sup>* ∈ Λ: 6. *<sup>λ</sup><sup>k</sup>* = *<sup>λ</sup>k*/*α<sup>k</sup>* 7. <sup>Λ</sup>[*<sup>i</sup>* + +] = input{*λLT*<sup>1</sup> , ··· , *<sup>λ</sup>L*1*<sup>Q</sup>* } 8. sort\_in\_descending\_order Λ 9. output Λ

### **4. Evaluation**

The performance of MCJ2K was evaluated for different working configurations and compared to previous proposals.

### *4.1. Materials and Methods*

Several test videos were used for our evaluation:


In all experiments, 129 images were compressed, and the search range for ME was 4 pixels using full-pixel accuracy of (*<sup>A</sup>* <sup>=</sup> 0). The block size (*B*) was 32 <sup>×</sup> 32 for Mobile, Container and Crew, 64 <sup>×</sup> <sup>64</sup> for CrowdRun and ReadySetGo, and 128 × 128 for Sun. The parameters used for compressing the coefs and the images were 5 levels for the DWT, no precinct partition and code-blocks of 64 × 64 coefficients. The number of quality layers (*Q*) was 8, which provides a good tradeoff between the compression performance and the granularity for the rate-allocation. In the case of the motion data, *Q* = 1 and no DWT were used.

### *4.2. Impact of Motion Compensation*

Figure 5 shows the performance of MCJ2K compared to MJ2K for different GOP sizes. Each video was compressed once and decompressed progressively, sorting the subband layers using OSLA. MCJ2K was in most of cases superior to MJ2K, depending on the temporal correlation found in each video. For example, MCTF is very efficient in Container, in which it can be seen that, for example, at 300 Kbps, MCJ2K is about 10 dB better than MJ2K. However, in the case of ReadySetGo, in which MCTF is not able to generate accurate predictions, the use of a GOP size larger than 4 does not increase the quality of the reconstructions. Therefore, the GOP size has a high impact on the performance of MCJ2K and is a parameter that should be optimized for every video sequence. Nevertheless, it can be expected that GOP sizes of 4 and 8 should work well for most sequences. We would like to highlight here that the MC model used in MCJ2K is very basic. More advanced predictors, such as those used in the last video coding standards cited earlier, would facilitate the use of larger GOP sizes and, therefore, higher compression ratios.

**Figure 5.** *Cont.*

**Figure 5.** MCJ2K (OSLA) vs. MJ2K for different GOP sizes and sequences.

### *4.3. MCJ2K (Using OSLA or ESLA) vs. MJ2K*

Using the information provided by the previous experiments, we selected a suitable GOP size for each sequence and compared the performance of OSLA and ESLA, respect to MJ2K. The results are shown in Figure 6. As can be seen, the performance of both RA algorithms is similar, which means that although the MCTF process used by MCJ2K is not linear, a reasonable prediction of the impact of the SLs can be made in ESLA, which runs much faster than OSLA. For this reason, in the following experiments only ESLA will be used.

**Figure 6.** MCJ2K (using OSLA or ESLA) vs. MJ2K for different sequences.

### *4.4. MCJ2K vs. Other Video Codecs*

Figure 7 shows the compression performance of MCJ2K (using ESLA and optimized compression parameters found in previous experiments) and other standard video codecs. Dashed lines represent a non-embedded decoding, while solid lines, a progressive decoding provided by scalable codecs. As can be seen, compared with non-scalable video codecs (which generally produce videos with a better R/D ratio than scalable video codecs), such as HEVC (https://hevc. hhi.fraunhofer.de/svn/svn\_HEVCSoftware using trunk/cfg/encode\_randomaccess\_main.cfg) or AVC (http://www.videolan.org/developers/x264.html using –profile high–preset placebo–tune psnr), MCJ2K needs approximately 50% more data to achieve the same quality, but this difference is much smaller when it is compared with SHVC (https://hevc.hhi.fraunhofer.de/ svn/svn\_SHVCSoftware/ using branches/SHM-dev/cfg/encoder\_randomaccess\_scalable.cfg and branches/SHM-dev/cfg/misc/layers8.cfg) where MCJ2K produces better results for some of the test videos (even using a very basic MCTF scheme). In the case of MPEG-2 (http://linux.die.net/ man/1/mpeg2enc, a codec that implements an MCTF scheme similar to the used in MCJ2K), MCJ2K outperforms it consistently. These results are consistent with the ME prediction model used in MCJ2K, which is not the focus of this research work.

**Figure 7.** MCJ2K (using ESLA) vs. other codecs for different sequences.
