Design of Efficient Perspective Affine Motion Estimation for VVC Standard

Created by: Byung-gyu Kim

The fundamental motion model of the conventional block-based motion compensation in High Efficiency Video Coding (HEVC) is a translational motion model. However, in the real world, the motion of an object exists in the form of combining many kinds of motions. In Versatile Video Coding (VVC), a block-based 4-parameter and 6-parameter affine motion compensation (AMC) is being applied. In natural videos, in the majority of cases, a rigid object moves without any regularity rather than maintains the shape or transform with a certain rate. For this reason, the AMC still has a limit to compute complex motions. Therefore, more flexible motion model is desired for new video coding tool. In this paper, we design a perspective affine motion compensation (PAMC) method which can cope with more complex motions such as shear and shape distortion. The proposed PAMC utilizes perspective and affine motion model. The perspective motion model-based method uses four control point motion vectors (CPMVs) to give degree of freedom to all four corner vertices. Besides, the proposed algorithm is integrated into the AMC structure so that the existing affine mode and the proposed perspective mode can be executed adaptively. Because the block with the perspective motion model is a rectangle without specific feature, the proposed PAMC shows effective encoding performance for the test sequence containing irregular object distortions or dynamic rapid motions in particular. Our proposed algorithm is implemented on VTM 2.0. The experimental results show that the BD-rate reduction of the proposed technique can be achieved up to 0.45% and 0.30% on Y component for random access (RA) and low delay P (LDP) configurations, respectively.

Table of Content [Hide]

3. Proposed Perspective Affine Motion Estimation/Compensation

Affine motion estimation in the VVC is applied since it is more efficient than translational motion compensation. The coding gain can be increased by delicately estimating motion on the video sequence in which complex motion is included. However, still it has a limit to accurately find all motions in the natural video.
Affine transformation model has properties to maintain parallelism based on the 2D plane, and thus cannot work efficiently for some sequences containing object distortions or dynamic motions such as shear and 3D affine transformation. In real world, numerous moving objects have irregular motions rather than regular translational, rotation and scaling motions. So, more elaborated motion model is needed for video coding tool to estimate motion delicately.
The basic warping transformation model can estimate motion more accurately, but this method is not suitable because of its high computational complexity and bit overhead by the large number of parameters. For these reasons, we propose a perspective affine motion compensation (PAMC) method which improve coding efficiency compared with the existing AMC method of the VVC. The perspective transformation model-based algorithm adds one more CPMV, which gives degree of freedom to all four corner vertices of the block for more precise motion vector. Furthermore, the proposed algorithm is integrated while maintaining the AMC structure. Therefore, it is possible to adopt an optimal mode between the existing encoding mode and the proposed encoding mode.

3.1. Perspective Motion Model for Motion Estimation

Figure 6 shows that the proposed perspective model with four CPs (b) can estimate motion more flexible compared with the affine model with three CPs (a). Affine motion model-based MVF of a current block is described by three CPs which are matched to {mv0,mv1,mv2}">{mv0,mv1,mv2} in illustration. On the other hand, one more field is added for perspective motion model-based MVF. It is composed of four CPs which are matched to {mv0,mv1,mv2,mv3}">{mv0,mv1,mv2,mv3}. As can be seen from Figure 6, one vertex of the block can be used additionally, so that motion estimation can be performed on various types of rectangular bases. Each side of the prediction block obtained through motion estimation based on the perspective motion model has various lengths and does not has to be parallel. The typical eight-parameter perspective motion model can be described as:
x=p1x+p2y+p3p7x+p8y+1,y=p4x+p5y+p6p7x+p8y+1.">⎧⎩⎨x=p1x+p2y+p3p7x+p8y+1,y=p4x+p5y+p6p7x+p8y+1.
where p1">p1p2">p2p3">p3p4">p4p5">p5p6">p6p7">p7 and p8">p8 are eight perspective model parameters. Among them, parameters p7">p7 and p8">p8 serve to give the perspective to motion model. With this characteristic, as though it is a conversion in the 2D plane, it is possible to obtain an effect that the surface on which the object is projected is changed.
Figure 6. The motion models: (a) 6-parameter affine model with three CPs, (b) perspective model with four CPs.
Instead of these eight parameters, we used four MVs to equivalently represent the perspective transformation model like the technique applied to AMC of the existing VTM. In video codecs, using MV is more efficient in terms of coding structure and flag bits. Those four MVs can be chosen at any location of the current block. However, in this paper, we choose the points at the top left, top right, bottom left and bottom right for convenience of model definition. In a W x H block as shown in Figure 7, we denote the MVs of (0,0)">(0,0)(W,0)">(W,0)(0,H)">(0,H), and (W,H)">(W,H) pixel as mv0">mv0mv1">mv1mv2">mv2 and mv3">mv3. Moreover, we replace p7·W+1">p7W+1 and p8·H+1">p8H+1 with a1">a1 and a2">a2 to simplify the formula. The six parameters p1">p1p2">p2p3">p3p4">p4p5">p5 and p6">p6 of model can solved as following Equation (4):
p1=a1(mv1hmv0h)W,p2=a2(mv2hmv0h)H,p3=mv0h,p4=a1(mv1vmv0v)W,p5=a2(mv2vmv0v)H,p6=mv0v.">⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪p1=a1(mvh1mvh0)W,p2=a2(mvh2mvh0)H,p3=mvh0,p4=a1(mvv1mvv0)W,p5=a2(mvv2mvv0)H,p6=mvv0.
Figure 7. The representation of vertices for perspective motion model.
In addition, p7·W">p7W and p8·H">p8H can solved as Equation (5):
p7·W=(mv3hmv2h)(2mv0vmv1v)+(mv3vmv2v)(mv1h2mv0h)(mv3vmv2v)(mv3hmv1h)+(mv3hmv2h)(mv3vmv1v),p8·H=(mv3hmv1h)(2mv0vmv1v)+(mv3vmv1v)(mv1h2mv0h)(mv3vmv1v)(mv3hmv2h)+(mv3hmv1h)(mv3vmv2v).">⎧⎩⎨⎪⎪⎪⎪p7W=(mvh3mvh2)(2mvv0mvv1)+(mvv3mvv2)(mvh12mvh0)(mvv3mvv2)(mvh3mvh1)+(mvh3mvh2)(mvv3mvv1),p8H=(mvh3mvh1)(2mvv0mvv1)+(mvv3mvv1)(mvh12mvh0)(mvv3mvv1)(mvh3mvh2)+(mvh3mvh1)(mvv3mvv2).
Based on Equations (4) and (5), we can derive MV at sample position (x,y)">(x,y) in a CU by following Equation (6):
mvh(x,y)=a1mv1hmv0hWx+a2mv2hmv0hHy+mv0ha11Wx+a21Hy+1,mvv(x,y)=a1mv1vmv0vWx+a2mv2vmv0vHy+mv0va11Wx+a21Hy+1.">⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪mvh(x,y)=a1(mvh1mvh0)Wx+a2(mvh2mvh0)Hy+mvh0a11Wx+a21Hy+1,mvv(x,y)=a1(mvv1mvv0)Wx+a2(mvv2mvv0)Hy+mvv0a11Wx+a21Hy+1.
With the AMC, the designed perspective motion compensation also is also applied by 4 × 4 sub block-based MV derivation in a CU. Similarly, the motion compensation interpolation filters are used to generate the prediction block.

3.2. Perspective Affine Motion Compensation

Based on the aforementioned perspective motion model, the proposed algorithm is integrated into the existing AMC. A flowchart of the proposed algorithm is shown in Figure 8. Each motion model has its own strength. As the number of parameters increases, the precision of generating a prediction block increases. So at the same time, more bit signaling for CPMVs is required. It is effective to use the perspective motion model with four MVs is appropriate for reliability. On the other hand, if only two or three MVs are sufficient, it may be excessive to use four MVs. To take advantage of each motion model, we propose an adaptive multi-motion model-based technique.
Figure 8. Flowchart of the proposed overall algorithm.
After performing fundamental translational ME and MC as in HEVC, the 4-parameter and 6-parameter affine prediction process is conducted first in step (1). Then, the proposed perspective prediction process is performed as step (2). After that, we check the best mode between result in step (1) and step (2) by RD cost check process in a current CU. Once the best mode is determined, the flag for prediction mode are signaled in the bitstream. At this time, two flags are required: affine flag and affine type flag. If the current CU is finally determined in affine mode, the affine flag is true and false otherwise. In other words, if the affine flag is false, only translational motion is used for ME. An affine type flag is signaled for a CU when its affine flag is true. When an affine type flag is 0, 4-parameter affine motion model is used for a CU. If an affine type flag is 1, 6-parameter affine motion model-based mode is used. Finally, when an affine type flag is 2, it means that the current CU is coded in the perspective mode.

Cite this article

Byung-Gyu, Kim. Design of Efficient Perspective Affine Motion Estimation for VVC Standard, Encyclopedia, 2019, v1, Available online: https://encyclopedia.pub/295