## 3. Proposed Perspective Affine Motion Estimation/Compensation

Affine motion estimation in the VVC is applied since it is more efficient than translational motion compensation. The coding gain can be increased by delicately estimating motion on the video sequence in which complex motion is included. However, still it has a limit to accurately find all motions in the natural video.

Affine transformation model has properties to maintain parallelism based on the 2D plane, and thus cannot work efficiently for some sequences containing object distortions or dynamic motions such as shear and 3D affine transformation. In real world, numerous moving objects have irregular motions rather than regular translational, rotation and scaling motions. So, more elaborated motion model is needed for video coding tool to estimate motion delicately.

The basic warping transformation model can estimate motion more accurately, but this method is not suitable because of its high computational complexity and bit overhead by the large number of parameters. For these reasons, we propose a perspective affine motion compensation (PAMC) method which improve coding efficiency compared with the existing AMC method of the VVC. The perspective transformation model-based algorithm adds one more CPMV, which gives degree of freedom to all four corner vertices of the block for more precise motion vector. Furthermore, the proposed algorithm is integrated while maintaining the AMC structure. Therefore, it is possible to adopt an optimal mode between the existing encoding mode and the proposed encoding mode.

#### 3.1. Perspective Motion Model for Motion Estimation

shows that the proposed perspective model with four CPs (b) can estimate motion more flexible compared with the affine model with three CPs (a). Affine motion model-based MVF of a current block is described by three CPs which are matched to

{mv0,mv1,mv2} in illustration. On the other hand, one more field is added for perspective motion model-based MVF. It is composed of four CPs which are matched to

{mv0,mv1,mv2,mv3}. As can be seen from , one vertex of the block can be used additionally, so that motion estimation can be performed on various types of rectangular bases. Each side of the prediction block obtained through motion estimation based on the perspective motion model has various lengths and does not has to be parallel. The typical eight-parameter perspective motion model can be described as:

where

p1,

p2,

p3,

p4,

p5,

p6,

p7 and

p8 are eight perspective model parameters. Among them, parameters

p7 and

p8 serve to give the perspective to motion model. With this characteristic, as though it is a conversion in the 2D plane, it is possible to obtain an effect that the surface on which the object is projected is changed.

**Figure 6.** The motion models: (**a**) 6-parameter affine model with three CPs, (**b**) perspective model with four CPs.

Instead of these eight parameters, we used four MVs to equivalently represent the perspective transformation model like the technique applied to AMC of the existing VTM. In video codecs, using MV is more efficient in terms of coding structure and flag bits. Those four MVs can be chosen at any location of the current block. However, in this paper, we choose the points at the top left, top right, bottom left and bottom right for convenience of model definition. In a

W x

H block as shown in , we denote the MVs of

(0,0),

(W,0),

(0,H), and

(W,H) pixel as

mv0,

mv1,

mv2 and

mv3. Moreover, we replace

p7⋅W+1 and

p8⋅H+1 with

a1 and

a2 to simplify the formula. The six parameters

p1,

p2,

p3,

p4,

p5 and

p6 of model can solved as following Equation (

4):

**Figure 7.** The representation of vertices for perspective motion model.

In addition,

p7⋅W and

p8⋅H can solved as Equation (

5):

Based on Equations (

4) and (

5), we can derive MV at sample position

(x,y) in a CU by following Equation (

6):

With the AMC, the designed perspective motion compensation also is also applied by 4 × 4 sub block-based MV derivation in a CU. Similarly, the motion compensation interpolation filters are used to generate the prediction block.

#### 3.2. Perspective Affine Motion Compensation

Based on the aforementioned perspective motion model, the proposed algorithm is integrated into the existing AMC. A flowchart of the proposed algorithm is shown in . Each motion model has its own strength. As the number of parameters increases, the precision of generating a prediction block increases. So at the same time, more bit signaling for CPMVs is required. It is effective to use the perspective motion model with four MVs is appropriate for reliability. On the other hand, if only two or three MVs are sufficient, it may be excessive to use four MVs. To take advantage of each motion model, we propose an adaptive multi-motion model-based technique.

**Figure 8.** Flowchart of the proposed overall algorithm.

After performing fundamental translational ME and MC as in HEVC, the 4-parameter and 6-parameter affine prediction process is conducted first in step (1). Then, the proposed perspective prediction process is performed as step (2). After that, we check the best mode between result in step (1) and step (2) by RD cost check process in a current CU. Once the best mode is determined, the flag for prediction mode are signaled in the bitstream. At this time, two flags are required: affine flag and affine type flag. If the current CU is finally determined in affine mode, the affine flag is true and false otherwise. In other words, if the affine flag is false, only translational motion is used for ME. An affine type flag is signaled for a CU when its affine flag is true. When an affine type flag is 0, 4-parameter affine motion model is used for a CU. If an affine type flag is 1, 6-parameter affine motion model-based mode is used. Finally, when an affine type flag is 2, it means that the current CU is coded in the perspective mode.