*3.2. Fast Mode Selection*

In other methods, affine prediction is evaluated for each block, which takes a significant amount of time. In Reference [10], there are two affine prediction modes, which take even more time. We propose heuristics to avoid computing all possible modes and save on encoding time.

We decide if affine models should be used over translation first by looking at the variance of the optical flow in a given block. The variance is computed as in Equation (13).

$$\bar{x} = \sum\_{i=0}^{N} \sum\_{j=0}^{N} \frac{flow\_x(i,j)}{N^2} \tag{9}$$

$$
\sigma\_x^2 = \sum\_{i=0}^N \sum\_{j=0}^N \frac{(flow\_x(i,j) - \mathfrak{x})^2}{N^2} \tag{10}
$$

$$\Psi = \sum\_{i=0}^{N} \sum\_{j=0}^{N} \frac{flow\_y(i,j)}{N^2} \tag{11}$$

$$
\sigma\_y^2 = \sum\_{i=0}^N \sum\_{j=0}^N \frac{(flow\_y(i,j) - \bar{y})^2}{N^2} \tag{12}
$$

$$
\sigma\_{xy} = \sqrt{\sigma\_x^2 + \sigma\_y^2} \tag{13}
$$

In these equations, *flowx*(*i*, *j*) and *flowy*(*i*, *j*) represent the optical flow at the position (*i*, *j*). When the resulting variance *σxy* is very small, translation for the whole block is likely to be very accurate, as every pixel has the same displacement. The opposite case, where the variance is very high, mostly represents large discontinuities in the motion vector we should use to predict the current block. It is very likely that splitting the block into smaller subblocks is preferable.

We decide on two threshold values for these cases, resulting in the following:


To determine the best threshold values, we ran tests on a few sequences. For the lower bound, 0.01 was determined experimentally to avoid skipping the numerous cases where the best parameter is 1 and the variance would be around 0.05. For the higher bound, we checked the variance of the sequences and values over 1 correlated heavily with object boundaries, but setting the threshold to 1 made the skipping too eager, so we increased it to 4 to allow for some margin of error.

Then, to see which 3-parameter model would fit best, the absolute values of *s* and *r* are compared, and the model corresponding with the highest value is selected. In case neither is bigger than a small threshold, set to a tenth of the minimal non-zero value for the affine parameter, affine motion estimation is skipped for the current block. While in most cases the variance heuristics catch those blocks, some outliers can affect the variance greatly.

To predict values for other pictures in the picture reference list, the displacement is scaled proportionally to the temporal distance between the frames. This approximation is typically accurate enough when the movement stays similar. For example, if the first reference picture is at a distance of 1 and the second at a distance of 2, the displacement values are doubled.
