Video Super Resolution

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1	handwiki	Camila Xu	--	3184	2022-11-17 01:38:16

The content is sourced from: https://handwiki.org/wiki/Video_Super_Resolution

Video Super Resolution is the process of generating high-resolution video frames from the given low-resolution ones. The main goal is to restore more fine details, while saving coarse ones. There are many approaches for this task, but it's still popular and challenging problem.

low-resolution high-resolution video frames

1. Math Explanation

The most research works consider degradation process of frames as

[math]\displaystyle{ y = (x * k)\downarrow{_s} + n }[/math]

where [math]\displaystyle{ x }[/math] — original high-resolution frame,
[math]\displaystyle{ k }[/math] — blur kernel,
[math]\displaystyle{ * }[/math] — convolution operation,
[math]\displaystyle{ \downarrow{_s} }[/math] — downscaling operation,
[math]\displaystyle{ n }[/math] — additive noise,
[math]\displaystyle{ y }[/math] — low-resolution frame

Super resolution is an inverse operation(calculate x from input y). So video super resolution problem is to estimate video sequence {[math]\displaystyle{ \overline{x} }[/math]} from video sequence {[math]\displaystyle{ y }[/math]} so that {[math]\displaystyle{ \overline{x} }[/math]} was close to original {[math]\displaystyle{ x }[/math]}. To do it better we need to estimate blur kernel, downscaling operation and additive noise for given input.

2. Methods

We can use single image super resolution methods to generate high-resolution frames independently from their counterparts. Working with video, we can also benefit from temporal information. There are a few traditional methods, which consider the video super resolution task as an optimization problem. Last years deep learning based methods for video upscaling outperform traditional ones.

2.1. Traditional Methods

There are several traditional methods for video upscaling. These methods try to utilize some natural preferences and effectively estimate motion between frames. The high-resolution frame is reconstructed based on both natural preferences and estimated motion.

Frequency domain

Firstly the low-resolution frame is transformed to the frequency domain. The high-resolution frame is estimated in this domain. Finally, this result frame is transformed to the spatial domain. Some methods use Fourier transform, which helps to extend the spectrum of captured signal and though increase resolution. There are different approaches for these methods: using weighted least squares theory,^[1] total least squares (TLS) algorithm,^[2] space-varying^[3] or spatio-temporal^[4] varying filtering. Other methods use Wavelet transform, which helps to find similarities in neighboring local areas.^[5] Later Second-generation wavelet transform was used for video super resolution.^[6]

Spatial domain

Iterative back-projection methods assume some function between low-resolution and high-resolution frames and try to improve their guessed function in each step of an iterative process.^[7] Projections onto convex sets(POCS), that defines a specific cost function, also can be used for iterative methods.^[8]

Iterative adaptive filtering algorithms use Kalman filter to estimate transformation from low-resolution frame to high-resolution one.^[9] To improve the final result these methods consider temporal correlation among low-resolution sequences. Some approaches also consider temporal correlation among high-resolution sequence.^[10] To approximate Kalman filter a common way is to use Least Mean Squares (LMS).^[11] One can also use Steepest descent,^[12] Least Squares (LS),^[13] Recursive Least Squares (RLS).^[13]

Direct methods estimate motion between frames, upscale a reference frame, and warp neighboring frames to the high-resolution reference one. To construct result, these upscaled frames are fused together by median filter,^[14] weighted median filter,^[15] adaptive normalized averaging, AdaBoost classifier^[16] or SVD based filters.^[17]

Non-parametric algorithms join motion estimation and frames fusion to one step. It performed by consideration of patches similarities. Weights for fusion can be calculated by Nonlocal-Means filters.^[18] To stength searching for similar pathes, one can use rotation invariance similarity measure^[19] or adaptive patch size.^[20] Calculating intra-frame similarity help to preserve small details and edges.^[21] Parameters for fusion also can be calculated by kernel regression.^[22]

Probabilistic methods use statistical theory to solve the task. Maximum likelihood (ML) methods estimate more probable image.^[23]^[24] Another group of methods use Maximum a posteriori (MAP) estimation. Regularization parameter for MAP can be estimated by Tikhonov regularization.^[25] Markov random fields (MRF) is often used along with MAP and helps to preserve similarity in neighboring patches.^[26] Huber MRFs are used to preserve sharp edges.^[27] Gaussian MRF can smooth some edges, but remove noise.^[28]

2.2. Deep Learning Based Methods

Aligned by Motion Estimation and Motion Compensation

In approaches with alignment, neighboring frames are firstly aligned with target one. One can align frames by performing Motion Estimation and Motion Compensation (MEMC) or by using Deformable convolution (DC). Motion Estimation gives information about motion of pixels between frames. Motion Compensation is a warping operation, which aligns one frame to another based on motion information. Examples of such methods:

Deep-DE^[29] (deep draft-ensemble learning) generates a series of SR feature maps and then process them together to estimate the final frame
VSRnet^[30] is based on SRCNN (model for single image super resolution), but takes multiple frames as input. Input frames are first aligned by the Druleas algorithm
VESPCN^[31] uses a spatial motion compensation transformer module (MCT), which estimates and compensates motion. Then a series of convolutions performed to extract feature and fuse them
DRVSR^[32] (detail-revealing deep video super-resolution) consists of three main steps: motion estimation, motion compensation and fusion. The motion compensation transformer (MCT) is used for motion estimation. The sub-pixel motion compensation layer (SPMC) compensates motion. Fusion step uses encoder-decoder architecture and ConvLSTM module to unit information from both spatial and temporal dimensions
RVSR^[33] (robust video super-resolution) have two branches: one for spatial alignment and another for temporal adaptation. The final frame is a weighted sum of branches' output
FRVSR^[34] (frame recurrent video super-resolution) estimate low-resolution optical flow, upsample it to high-resolution and warp previous output frame by using this high-resolution optical flow
STTN^[35] (the spatio-temporal transformer network) estimate optical flow by U-style network based on Unet and compensate motion by a trilinear interpolation method
SOF-VSR^[36] (super-resolution optical flow for video super-resolution) calculate high-resolution optical flow in coarse-to-fine manner. Then the low-resolution optical flow is estimated by a space-to-depth transformation. The final super-resolution result is gained from aligned low-resolution frames
TecoGAN^[37] (the temporally coherent GAN) consists of generator and discriminator. Generator estimates LR optical flow between consecutive frames and from this approximate HR optical flow to yield output frame. The discriminator assesses the quality of the generator
TOFlow^[38] (task-oriented flow) is a combination of optical flow network and reconstruction network. Estimated optical flow is suitable for a particular task, such as video super resolution
MMCNN^[39] (the multi-memory convolutional neural network) aligns frames with target one and then generates the final HR-result through the feature extraction, detail fusion and feature reconstruction modules
RBPN^[40] (the recurrent back-projection network). The input of each recurrent projection module features from the previous frame, features from the consequence of frames, and optical flow between neighboring frames
MEMC-Net^[41] (the motion estimation and motion compensation network) uses both motion estimation network and kernel estimation network to warp frames adaptively
RTVSR^[42] (real-time video super-resolution) aligns frames with estimated convolutional kernel
MultiBoot VSR^[43] (the multi-stage multi-reference bootstrapping method) aligns frames and then have two-stage of SR-reconstruction to improve quality
BasicVSR^[44] aligns frames with optical flow and then fuse their features in a recurrent bidirectional scheme
IconVSR^[44] is a refined version of BasicVSR with a recurrent coupled propagation scheme
UVSR^[45] (unrolled network for video super-resolution) adapted unrolled optimization algorithms to solve the VSR problem

Aligned by deformable convolution

Another way to align neighboring frames with target one is deformable convolution. While usual convolution has fixed kernel, deformable convolution on the first step estimate shifts for kernel and then do convolution. Examples of such methods:

EDVR^[46] (The enhanced deformable video restoration) can be divided into two main modules: the pyramid, cascading and deformable (PCD) module for alignment and the temporal-spatial attention (TSA) module for fusion
DNLN^[47] (The deformable non-local network) has alignment module, based on deformable convolution with the hierarchical feature fusion module (HFFB) for better quality) and non-local attention module
TDAN^[48] (The temporally deformable alignment network) consists of an alignment module and a reconstruction module. Alignment performed by deformable convolution based on feature extraction and alignment
Multi-Stage Feature Fusion Network^[49] for Video Super-Resolution uses the multi-scale dilated deformable convolution for frame alignment and the Modulative Feature Fusion Branch to integrate aligned frames

Aligned by homography

Some methods align frames by calculated homography between frames.

TGA^[50] (Temporal Group Attention) divide input frames to N groups dependent on time difference and extract information from each group independently. Fast Spatial Alignment module based on homography used to align frames

Spatial non-aligned

Methods without alignment do not perform alignment as a first step and just process input frames.

VSRResNet^[51] like GAN consists of generator and discriminator. Generator upsamples input frames, extracts features and fuses them. Discriminator assess the quality of result high-resolution frames
FFCVSR^[52] (frame and feature-context video super-resolution) takes unaligned low-resolution frames and output high-resolution previous frames to simultaneously restore high-frequency details and maintain temporal consistency
MRMNet^[53] (the multi-resolution mixture network) consists of three modules: bottleneck, exchange, and residual. Bottleneck unit extract features that have the same resolution as input frames. Exchange module exchange features between neighboring frames and enlarges feature maps. Residual module extract features after exchange one
STMN^[54] (the spatio-temporal matching network) use discrete wavelet transform to fuse temporal features. Non-local matching block integrates super-resolution and denoising. At the final step, SR-result is got on the global wavelet domain
MuCAN^[55] (the multi-correspondence aggregation network) uses temporal multi-correspondence strategy to fuse temporal features and cross-scale nonlocal-correspondence to extract self-similarities in frames

3D convolutions

While 2D convolutions work on spatial domain, 3D convolutions use both spatial and temporal information. They perform motion compensation and maintain temporal consistency

DUF^[56] (the dynamic upsampling filters) uses deformable 3D convolution for motion compensation. The model estimates kernels for specific input frames
FSTRN^[57] (The fast spatio-temporal residual network) includes a few modules: LR video shallow feature extraction net (LFENet), LR feature fusion and up-sampling module (LSRNet) and two residual modules: spatio-temporal and global
3DSRnet^[58] (The 3D super-resolution network) uses 3D convolutions to extract spatio-temporal information. Model also has a special approach for frames, where scene change is detected
MP3D^[59] (the multi-scale pyramid 3D convolutional network) uses 3D convolution to extract spatial and temporal features simultaneously, which then passed through reconstruction module with 3D sub-pixel convolution for upsampling
DMBN^[60] (the dynamic multiple branch network) has three branches to exploit information from multiple resolutions. Finally, information from branches fuse dynamically

Recurrent neural networks

Recurrent convolutional neural networks perform video super resolution by storing temporal dependencies.

STCN^[61] (the spatio-temporal convolutional network) extract features in the spatial module, pass them through the recurrent temporal module and final reconstruction module. Temporal consistency is maintained by long short-term memory (LSTM) mechanism
BRCN^[62] (the bidirectional recurrent convolutional network) has two subnetworks: with forward fusion and backward fusion. The result of the network is a composition of two branches' output
RISTN^[63] (the residual invertible spatio-temporal network) consists of spatial, temporal and reconstruction module. Spatial module composed of residual invertible blocks (RIB), which extract spatial features effectively. The output of the spatial module is processed by the temporal module, which extracts spatio-temporal information and then fuses important features. The final result is calculated in the reconstruction module by deconvolution operation
RRCN^[64] (the residual recurrent convolutional network) is a bidirectional recurrent network, which calculates a residual image. Then the final result is gained by adding a bicubically upsampled input frame
RRN^[65] (the recurrent residual network) uses a recurrent sequence of residual blocks to extract spatial and temporal information
BTRPN^[66] (the bidirectional temporal-recurrent propagation network) use bidirectional recurrent scheme. Final-result combined from two branches with channel attention mechanism
RLSP^[67] (recurrent latent state propagation) fully convolutional network cell with highly efficient propagation of temporal information through a hidden state
RSDN^[68] (the recurrent structure-detail network) divide input frame into structure and detail components and process them in two parallel streams

Non-local

Non-local methods extract both spatial and temporal information. The key idea is to use all possible positions as a weighted sum. This strategy may be more effective than local approaches.

PFNL^[69] (the progressive fusion non-local method) extract spatio-temporal features by non-local residual blocks, then fuse them by progressive fusion residual block (PFRB). The result of these blocks is a residual image. The final result is gained by adding bicubically upsampled input frame
NLVSR^[70] (the novel video super‐resolution network) aligns frames with target one by temporal‐spatial non‐local operation. To integrate information from aligned frames an attention‐based mechanism is used
MSHPFNL^[71] also incorporates multi-scale structure and hybrid convolutions to extract wide-range dependencies. To avoid some artifacts like flickering or ghosting, they use generative adversarial training

3. Metrics

The common way to estimate the performance of video super resolution algorithms is to use a few metrics:

PSNR (Peak signal-noise ratio) calculates the difference between two corresponding frames based on mean squared error (MSE)
SSIM (Structural similarity index) measures the similarity of structure between two corresponding frames
IFC (Information Fidelity Criterion) show information similarity with reference frame
MOVIE (Motion-based Video Integrity Evaluation index) integrates explicit motion information by estimating distortions along motion trajectories
LPIPS (Learned Perceptual Image Patch Similarity) compare the perceptual similarity of frames based on deep features
tOF measures pixel-wise motion similarity with reference frame based on optical flow
tLP calculates how LPIPS changes from frame to frame in comparison with the reference sequence

Another way to assess the performance of the video super resolution algorithm is to organize subjective assessment. People are asked to compare to corresponding frames. The final Mean opinion score (MOS) is calculated as the arithmetic mean overall ratings.

4. Datasets

While deep learning approaches of video super resolution outperform traditional ones, it's crucial to form a high-quality dataset for evaluation. It's important to verify models' ability to restore small details, text and objects with complicated structure, to cope with big motion and noise.

Comparison of datasets
Dataset	Videos	Mean video length	Ground-truth resolution	Motion in frames	Fine details
Vid4	4	43 frames	720×480	Without fast motion	Some small details, without text
SPMCS	30	31 frames	960×540	SLow motion	A lot of small details
Vimeo-90K (test SR set)	7824	7 frames	448×256	A lot of fast, difficult, diverse motion	Few details, text in a few sequences
Xiph HD (complete sets)	70	2 seconds	from 640×360 to 4096×2160	A lot of fast, difficult, diverse motion	Few details, text in a few sequences
Ultra Video Dataset 4K	16	10 seconds	4096×2160	Diverse motion	Few details, without text
REDS (test SR)	30	100 frames	1280×720	A lot of fast, difficult, diverse motion	Few details, without text
Space-Time SR	5	100 frames	1280×720	Diverse motion	Without small details and text
Harmonic	—	—	4096×2160	—	—
CDVL	—	—	1920×1080	—	—

5. Benchmarks

A few benchmarks in video super resolution were organized by companies and conferences. The purposes of such challenges are to compare diverse algorithms and to find the state-of-the-art for the task.

Comparison of benchmarks
Benchmark	Organizer	Dataset	Upscale factor	Metrics
NTIRE 2019 Challenge	CVPR (Computer Vision and pattern recognition)	REDS	4	PSNR, SSIM
Youku-VESR Challenge 2019	Youku	Youku-VESR	4	PSNR, VMAF
AIM 2019 Challenge	ECCV (European Conference on Computer Vision)	Vid3oC	16	PSNR, SSIM, MOS
AIM 2020 Challenge	ECCV (European Conference on Computer Vision)	Vid3oC	16	PSNR, SSIM, LPIPS
Mobile Video Restoration Challenge	ICIP (International Conference of Image Processing), Kwai	—	—	PSNR, SSIM, MOS
MSU Video Super Resolution Benchmark 2021	MSU (Moscow State University)	—	4	ERQAv1.0, PSNR and SSIM with shift compensation, QRCRv1.0, CRRMv1.0

5.1. NTIRE 2019 Challenge

The NTIRE 2019 Challenge was organized by CVPR and proposed two tracks for Video Super Resolution: clean (only bicubic degradation) and blur (blur added firstly). Each track had more than 100 participants and 14 final results were submitted.
Dataset REDS was collected for this challenge. It consists of 30 videos of 100 frames each. The resolution of ground-truth frames is 1280×720. The tested scale factor is 4. To evaluate models' performance PSNR and SSIM were used. The best participants' results are performed in the table:

Top teams
Team	Model name	PSNR (clean track)	SSIM (clean track)	PSNR (blur track)	SSIM (blur track)	Runtime per image in sec (clean track)	Runtime per image in sec (blur track)	Platform	GPU	Open source
HelloVSR	EDVR	31.79	0.8962	30.17	0.8647	2.788	3.562	PyTorch	TITAN Xp	YES
UIUC-IFP	WDVR	30.81	0.8748	29.46	0.8430	0.980	0.980	PyTorch	Tesla V100	YES
SuperRior	ensemble of RDN, RCAN, DUF	31.13	0.8811	—	—	120.000	—	PyTorch	Tesla V100	NO
CyberverseSanDiego	RecNet	31.00	0.8822	27.71	0.8067	3.000	3.000	TensorFlow	RTX 2080 Ti	YES
TTI	RBPN	30.97	0.8804	28.92	0.8333	1.390	1.390	PyTorch	TITAN X	YES
NERCMS	PFNL	30.91	0.8782	28.98	0.8307	6.020	6.020	PyTorch	GTX 1080 Ti	YES
XJTU-IAIR	FSTDN	—	—	28.86	0.8301	—	13.000	PyTorch	GTX 1080 Ti	NO

5.2. Youku-VESR Challenge 2019

The Youku-VESR Challenge was organized to check models' ability to cope with degradation and noise, which are real for Youku online video-watching application. The proposed dataset consists of 1000 videos, each length is 4–6 seconds. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 4. PSNR and VMAF metrics were used for performance evaluation. Top methods are performed in the table:

Top teams
Team	PSNR	VMAF
Avengers Assemble	37.851	41.617
NJU_L1	37.681	41.227
ALONG_NTES	37.632	40.405

5.3. AIM 2019 Challenge

The challenge was held by ECCV and had two tracks on video extreme super resolution: first track checks the fidelity with reference frame (measured by PSNR and SSIM). Second track check the perceptual quality of videos (MOS). Dataset consists of 328 video sequences of 120 frames each. The resolution of ground-truth frames is 1920×1080. The tested scale factor is 16. Top methods are performed in the table:

Top teams
Team	Model name	PSNR	SSIM	MOS	Runtime per image in sec	Platform	GPU/CPU	Open source
fenglinglwb	based on EDVR	22.53	0.64	first result	0.35	PyTorch	4× Titan X	NO
NERCMS	PFNL	22.35	0.63	—	0.51	PyTorch	2× 1080 Ti	NO
baseline	RLSP	21.75	0.60	—	0.09	TensorFlow	Titan Xp	NO
HIT-XLab	based on EDSR	21.45	0.60	second result	60.00	PyTorch	V100	NO

5.4. AIM 2020 Challenge

Challenge's conditions are the same as AIM 2019 Challenge. Top methods are performed in the table:

Top teams
Team	Model name	Params number	PSNR	SSIM	Runtime per image in sec	GPU/CPU	Open source
KirinUK	EVESRNet	45.29M	22.83	0.6450	6.1 s	1 × 2080Ti 6	NO
Team-WVU	—	29.51M	22.48	0.6378	4.9 s	1 × TitanXp	NO
BOE-IOT-AIBD	3D-MGBP	53M	22.48	0.6304	4.83 s	1 × 1080	NO
sr xxx	based on EDVR	—	22.43	0.6353	4 s	1 × V100	NO
ZZX	MAHA	31.14M	22.28	0.6321	4 s	1 × 1080Ti	NO
lyl	FineNet	—	22.08	0.6256	13 s	—	NO
TTI	based on STARnet	—	21.91	0.6165	0.249 s	—	NO
CET CVLab		—	21.77	0.6112	0.04 s	1 × P100	NO

5.5. MSU Video Super Resolution Benchmark

The MSU Video Super-Resolution Benchmark was organized by MSU and proposed three types of motion, two ways to lower resolution, and eight types of content in the dataset. The resolution of ground-truth frames is 1920×1280. The tested scale factor is 4. 14 models were tested. To evaluate models' performance PSNR and SSIM were used with shift compensation. Also proposed a few new metrics: ERQAv1.0, QRCRv1.0, and CRRMv1.0.^[72] Top methods are performed in the table:

Top methods
Model name	Multi-frame	Subjective	ERQAv1.0	PSNR	SSIM	QRCRv1.0	CRRMv1.0	Runtime per image in sec	Open source
DBVSR	YES	5.561	0.737	31.071	0.894	0.629	0.992	—	YES
LGFN	YES	5.040	0.740	31.291	0.898	0.629	0.996	1.499	YES
DynaVSR-R	YES	4.751	0.709	28.377	0.865	0.557	0.997	5.664	YES
TDAN	YES	4.036	0.706	30.244	0.883	0.557	0.994	—	YES
DUF-28L	YES	3.910	0.645	25.852	0.830	0.549	0.993	2.392	YES
RRN-10L	YES	3.887	0.627	24.252	0.790	0.557	0.989	0.390	YES
RealSR	NO	3.749	0.690	25.989	0.767	0.000	0.886	—	YES

6. Application

In many areas, working with video, we deal with different types of video degradation, including downscaling. The resolution of video can be degraded because of imperfections of measuring devices, such as optical degradations and limited size of camera sensors. Bad light and weather conditions add noise to video. Object and camera motion also decrease video quality. Super Resolution techniques help to restore the original video. It's useful in a wide range of applications, such as

video surveillance (to improve video captured from the camera and recognize car numbers and faces)
medical imaging (to discover better some organs or tissues for clinical analysis and medical intervention)
forensic science (to help in the investigation during the criminal procedure)
astronomy (to improve quality of video of stars and planets)
remote sensing (to eliviate observation of an object)
microscopy (to strength microscopes' ability)

It also helps to solve task of object detection, face and character recognition (as preprocessing step). The interest to super resolution is growing with the development of high definition computer displays and TVs.

References

Kim, S. P.; Bose, N. K.; Valenzuela, H. M. (1989). "Reconstruction of high resolution image from noise undersampled frames". Lecture Notes in Control and Information Sciences. 129. Berlin/Heidelberg: Springer-Verlag. pp. 315–326. doi:10.1007/bfb0042742. ISBN 3-540-51424-4. https://dx.doi.org/10.1007%2Fbfb0042742
Bose, N.K.; Kim, H.C.; Zhou, B.. "Performance analysis of the TLS algorithm for image reconstruction from a sequence of undersampled noisy and blurred frames". IEEE Comput. Soc. Press. doi:10.1109/icip.1994.413741. ISBN 0-8186-6952-7. https://dx.doi.org/10.1109%2Ficip.1994.413741
Tekalp, A.M.; Ozkan, M.K.; Sezan, M.I. (1992). "High-resolution image reconstruction from lower-resolution image sequences and space-varying image restoration". IEEE. doi:10.1109/icassp.1992.226249. ISBN 0-7803-0532-9. https://dx.doi.org/10.1109%2Ficassp.1992.226249
Goldberg, N.; Feuer, A.; Goodwin, G.C. (2003). "Super-resolution reconstruction using spatio-temporal filtering". Journal of Visual Communication and Image Representation (Elsevier BV) 14 (4): 508–525. doi:10.1016/s1047-3203(03)00042-7. ISSN 1047-3203. https://dx.doi.org/10.1016%2Fs1047-3203%2803%2900042-7
Mallat, S (2010). "Super-Resolution With Sparse Mixing Estimators". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 19 (11): 2889–2900. doi:10.1109/tip.2010.2049927. ISSN 1057-7149. PMID 20457549. Bibcode: 2010ITIP...19.2889M. https://dx.doi.org/10.1109%2Ftip.2010.2049927
Bose, N.K.; Lertrattanapanich, S.; Chappalli, M.B. (2004). "Superresolution with second generation wavelets". Signal Processing: Image Communication (Elsevier BV) 19 (5): 387–391. doi:10.1016/j.image.2004.02.001. ISSN 0923-5965. https://dx.doi.org/10.1016%2Fj.image.2004.02.001
Cohen, B.; Avrin, V.; Dinstein, I.. "Polyphase back-projection filtering for resolution enhancement of image sequences". IEEE. doi:10.1109/icassp.2000.859267. ISBN 0-7803-6293-4. https://dx.doi.org/10.1109%2Ficassp.2000.859267
Katsaggelos, A.K.. "An iterative weighted regularized algorithm for improving the resolution of video sequences". IEEE Comput. Soc. doi:10.1109/icip.1997.638811. ISBN 0-8186-8183-7. https://dx.doi.org/10.1109%2Ficip.1997.638811
Farsiu, Sina; Elad, Michael; Milanfar, Peyman (2006-01-15). "A practical approach to superresolution". in Apostolopoulos, John G.; Said, Amir. SPIE. doi:10.1117/12.644391. https://dx.doi.org/10.1117%2F12.644391
"A new state-space approach for super-resolution image sequence reconstruction". IEEE. 2005. doi:10.1109/icip.2005.1529892. ISBN 0-7803-9134-9. https://dx.doi.org/10.1109%2Ficip.2005.1529892
Costa, Guilherme Holsbach; Bermudez, Jos Carlos Moreira (2007). "Statistical Analysis of the LMS Algorithm Applied to Super-Resolution Image Reconstruction". IEEE Transactions on Signal Processing (Institute of Electrical and Electronics Engineers (IEEE)) 55 (5): 2084–2095. doi:10.1109/tsp.2007.892704. ISSN 1053-587X. Bibcode: 2007ITSP...55.2084C. https://dx.doi.org/10.1109%2Ftsp.2007.892704
Elad, M.; Feuer, A.. "Super-resolution reconstruction of continuous image sequences". IEEE. doi:10.1109/icip.1999.817156. ISBN 0-7803-5467-2. https://dx.doi.org/10.1109%2Ficip.1999.817156
Elad, M.; Feuer, A. (1999). "Superresolution restoration of an image sequence: adaptive filtering approach". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 8 (3): 387–395. doi:10.1109/83.748893. ISSN 1057-7149. PMID 18262881. Bibcode: 1999ITIP....8..387E. https://dx.doi.org/10.1109%2F83.748893
Pickering, M.; Frater, M.; Arnold, J. (2005). "Arobust approach to super-resolution sprite generation". IEEE. doi:10.1109/icip.2005.1529896. ISBN 0-7803-9134-9. https://dx.doi.org/10.1109%2Ficip.2005.1529896
Nasonov, Andrey V.; Krylov, Andrey S. (2010). "Fast Super-Resolution Using Weighted Median Filtering". IEEE. doi:10.1109/icpr.2010.546. ISBN 978-1-4244-7542-1. https://dx.doi.org/10.1109%2Ficpr.2010.546
Simonyan, K.; Grishin, S.; Vatolin, D.; Popov, D. (2008). "Fast video super-resolution via classification". IEEE. doi:10.1109/icip.2008.4711763. ISBN 978-1-4244-1765-0. https://dx.doi.org/10.1109%2Ficip.2008.4711763
Nasir, Haidawati; Stankovic, Vladimir; Marshall, Stephen (2011). "Singular value decomposition based fusion for super-resolution image reconstruction". IEEE. doi:10.1109/icsipa.2011.6144138. ISBN 978-1-4577-0242-6. https://dx.doi.org/10.1109%2Ficsipa.2011.6144138
Protter, M.; Elad, M.; Takeda, H.; Milanfar, P. (2009). "Generalizing the Nonlocal-Means to Super-Resolution Reconstruction". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 18 (1): 36–51. doi:10.1109/tip.2008.2008067. ISSN 1057-7149. PMID 19095517. Bibcode: 2009ITIP...18...36P. https://dx.doi.org/10.1109%2Ftip.2008.2008067
Zhuo, Yue; Liu, Jiaying; Ren, Jie; Guo, Zongming (2012). "Nonlocal based Super Resolution with rotation invariance and search window relocation". IEEE. doi:10.1109/icassp.2012.6288018. ISBN 978-1-4673-0046-9. https://dx.doi.org/10.1109%2Ficassp.2012.6288018
Cheng, Ming-Hui; Chen, Hsuan-Ying; Leou, Jin-Jang (2011). "Video super-resolution reconstruction using a mobile search strategy and adaptive patch size". Signal Processing (Elsevier BV) 91 (5): 1284–1297. doi:10.1016/j.sigpro.2010.12.016. ISSN 0165-1684. https://dx.doi.org/10.1016%2Fj.sigpro.2010.12.016
Huhle, Benjamin; Schairer, Timo; Jenke, Philipp; Straßer, Wolfgang (2010). "Fusion of range and color images for denoising and resolution enhancement with a non-local filter". Computer Vision and Image Understanding (Elsevier BV) 114 (12): 1336–1345. doi:10.1016/j.cviu.2009.11.004. ISSN 1077-3142. https://dx.doi.org/10.1016%2Fj.cviu.2009.11.004
Takeda, Hiroyuki; Farsiu, Sina; Milanfar, Peyman (2007). "Kernel Regression for Image Processing and Reconstruction". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 16 (2): 349–366. doi:10.1109/tip.2006.888330. ISSN 1057-7149. PMID 17269630. Bibcode: 2007ITIP...16..349T. https://dx.doi.org/10.1109%2Ftip.2006.888330
Elad, M.; Feuer, A. (1997). "Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 6 (12): 1646–1658. doi:10.1109/83.650118. ISSN 1057-7149. PMID 18285235. Bibcode: 1997ITIP....6.1646E. https://dx.doi.org/10.1109%2F83.650118
Farsiu, Sina; Robinson, Dirk; Elad, Michael; Milanfar, Peyman (2003-11-20). "Robust shift and add approach to superresolution". in Tescher, Andrew G.. SPIE. doi:10.1117/12.507194. https://dx.doi.org/10.1117%2F12.507194
Chantas, G.K.; Galatsanos, N.P.; Woods, N.A. (2007). "Super-Resolution Based on Fast Registration and Maximum a Posteriori Reconstruction". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 16 (7): 1821–1830. doi:10.1109/tip.2007.896664. ISSN 1057-7149. PMID 17605380. Bibcode: 2007ITIP...16.1821C. https://dx.doi.org/10.1109%2Ftip.2007.896664
Rajan, D.; Chaudhuri, S.. "Generation of super-resolution images from blurred observations using Markov random fields". IEEE. doi:10.1109/icassp.2001.941300. ISBN 0-7803-7041-4. https://dx.doi.org/10.1109%2Ficassp.2001.941300
Zibetti, Marcelo Victor Wust; Mayer, Joceli (2006). "Outlier Robust and Edge-Preserving Simultaneous Super-Resolution". IEEE. doi:10.1109/icip.2006.312718. ISBN 1-4244-0480-0. https://dx.doi.org/10.1109%2Ficip.2006.312718
Joshi, M.V.; Chaudhuri, S.; Panuganti, R. (2005). "A Learning-Based Method for Image Super-Resolution From Zoomed Observations". IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics) (Institute of Electrical and Electronics Engineers (IEEE)) 35 (3): 527–537. doi:10.1109/tsmcb.2005.846647. ISSN 1083-4419. PMID 15971920. https://dx.doi.org/10.1109%2Ftsmcb.2005.846647
Liao, Renjie; Tao, Xin; Li, Ruiyu; Ma, Ziyang; Jia, Jiaya (2015). "Video Super-Resolution via Deep Draft-Ensemble Learning". IEEE. doi:10.1109/iccv.2015.68. ISBN 978-1-4673-8391-2. https://dx.doi.org/10.1109%2Ficcv.2015.68
Kappeler, Armin; Yoo, Seunghwan; Dai, Qiqin; Katsaggelos, Aggelos K. (2016). "Video Super-Resolution With Convolutional Neural Networks". IEEE Transactions on Computational Imaging (Institute of Electrical and Electronics Engineers (IEEE)) 2 (2): 109–122. doi:10.1109/tci.2016.2532323. ISSN 2333-9403. https://dx.doi.org/10.1109%2Ftci.2016.2532323
Caballero, Jose; Ledig, Christian; Aitken, Andrew; Acosta, Alejandro; Totz, Johannes; Wang, Zehan; Shi, Wenzhe (2016-11-16). "Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation". arXiv:1611.05250v2 [cs.CV]. //arxiv.org/archive/cs.CV
Tao, Xin; Gao, Hongyun; Liao, Renjie; Wang, Jue; Jia, Jiaya (2017). "Detail-Revealing Deep Video Super-Resolution". IEEE. doi:10.1109/iccv.2017.479. ISBN 978-1-5386-1032-9. https://dx.doi.org/10.1109%2Ficcv.2017.479
Liu, Ding; Wang, Zhaowen; Fan, Yuchen; Liu, Xianming; Wang, Zhangyang; Chang, Shiyu; Huang, Thomas (2017). "Robust Video Super-Resolution with Learned Temporal Dynamics". IEEE. doi:10.1109/iccv.2017.274. ISBN 978-1-5386-1032-9. https://dx.doi.org/10.1109%2Ficcv.2017.274
Sajjadi, Mehdi S. M.; Vemulapalli, Raviteja; Brown, Matthew (2018). "Frame-Recurrent Video Super-Resolution". IEEE. doi:10.1109/cvpr.2018.00693. ISBN 978-1-5386-6420-9. https://dx.doi.org/10.1109%2Fcvpr.2018.00693
Kim, Tae Hyun; Sajjadi, Mehdi S. M.; Hirsch, Michael; Schölkopf, Bernhard (2018). "Spatio-Temporal Transformer Network for Video Restoration". Computer Vision – ECCV 2018. Cham: Springer International Publishing. pp. 111–127. doi:10.1007/978-3-030-01219-9_7. ISBN 978-3-030-01218-2. https://dx.doi.org/10.1007%2F978-3-030-01219-9_7
Wang, Longguang; Guo, Yulan; Liu, Li; Lin, Zaiping; Deng, Xinpu; An, Wei (2020). "Deep Video Super-Resolution Using HR Optical Flow Estimation". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 29: 4323–4336. doi:10.1109/tip.2020.2967596. ISSN 1057-7149. PMID 31995491. Bibcode: 2020ITIP...29.4323W. https://dx.doi.org/10.1109%2Ftip.2020.2967596
Chu, Mengyu; Xie, You; Mayer, Jonas; Leal-Taixé, Laura; Thuerey, Nils (2020-07-08). "Learning temporal coherence via self-supervision for GAN-based video generation". ACM Transactions on Graphics (Association for Computing Machinery (ACM)) 39 (4). doi:10.1145/3386569.3392457. ISSN 0730-0301. https://dx.doi.org/10.1145%2F3386569.3392457
Xue, Tianfan; Chen, Baian; Wu, Jiajun; Wei, Donglai; Freeman, William T. (2019-02-12). "Video Enhancement with Task-Oriented Flow". International Journal of Computer Vision (Springer Science and Business Media LLC) 127 (8): 1106–1125. doi:10.1007/s11263-018-01144-2. ISSN 0920-5691. https://dx.doi.org/10.1007%2Fs11263-018-01144-2
Wang, Zhongyuan; Yi, Peng; Jiang, Kui; Jiang, Junjun; Han, Zhen; Lu, Tao; Ma, Jiayi (2019). "Multi-Memory Convolutional Neural Network for Video Super-Resolution". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 28 (5): 2530–2544. doi:10.1109/tip.2018.2887017. ISSN 1057-7149. PMID 30571634. Bibcode: 2019ITIP...28.2530W. https://dx.doi.org/10.1109%2Ftip.2018.2887017
Haris, Muhammad; Shakhnarovich, Gregory; Ukita, Norimichi (2019). "Recurrent Back-Projection Network for Video Super-Resolution". IEEE. doi:10.1109/cvpr.2019.00402. ISBN 978-1-7281-3293-8. https://dx.doi.org/10.1109%2Fcvpr.2019.00402
Bao, Wenbo; Lai, Wei-Sheng; Zhang, Xiaoyun; Gao, Zhiyong; Yang, Ming-Hsuan (2021-03-01). "MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement". IEEE Transactions on Pattern Analysis and Machine Intelligence (Institute of Electrical and Electronics Engineers (IEEE)) 43 (3): 933–948. doi:10.1109/tpami.2019.2941941. ISSN 0162-8828. PMID 31722471. https://dx.doi.org/10.1109%2Ftpami.2019.2941941
Bare, Bahetiyaer; Yan, Bo; Ma, Chenxi; Li, Ke (2019). "Real-time video super-resolution via motion convolution kernel estimation". Neurocomputing (Elsevier BV) 367: 236–245. doi:10.1016/j.neucom.2019.07.089. ISSN 0925-2312. https://dx.doi.org/10.1016%2Fj.neucom.2019.07.089
Kalarot, Ratheesh; Porikli, Fatih (2019). "MultiBoot Vsr: Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution". IEEE. doi:10.1109/cvprw.2019.00258. ISBN 978-1-7281-2506-0. https://dx.doi.org/10.1109%2Fcvprw.2019.00258
Chan, Kelvin C. K.; Wang, Xintao; Yu, Ke; Dong, Chao; Loy, Chen Change (2020-12-03). "BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond". arXiv:2012.02181v1 [cs.CV]. //arxiv.org/archive/cs.CV
Naoto Chiche, Benjamin; Frontera-Pons, Joana; Woiselle, Arnaud; Starck, Jean-Luc (2020-11-09). "Deep Unrolled Network for Video Super-Resolution". IEEE. doi:10.1109/ipta50016.2020.9286636. ISBN 978-1-7281-8750-1. https://dx.doi.org/10.1109%2Fipta50016.2020.9286636
Wang, Xintao; Chan, Kelvin C. K.; Yu, Ke; Dong, Chao; Loy, Chen Change (2019-05-07). "EDVR: Video Restoration with Enhanced Deformable Convolutional Networks". arXiv:1905.02716v1 [cs.CV]. //arxiv.org/archive/cs.CV
Wang, Hua; Su, Dewei; Liu, Chuangchuang; Jin, Longcun; Sun, Xianfang; Peng, Xinyi (2019). "Deformable Non-Local Network for Video Super-Resolution". IEEE Access (Institute of Electrical and Electronics Engineers (IEEE)) 7: 177734–177744. doi:10.1109/access.2019.2958030. ISSN 2169-3536. https://dx.doi.org/10.1109%2Faccess.2019.2958030
Tian, Yapeng; Zhang, Yulun; Fu, Yun; Xu, Chenliang (2020). "TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution". IEEE. doi:10.1109/cvpr42600.2020.00342. ISBN 978-1-7281-7168-5. https://dx.doi.org/10.1109%2Fcvpr42600.2020.00342
Song, Huihui; Xu, Wenjie; Liu, Dong; Liua, Bo; Liub, Qingshan; Metaxas, Dimitris N. (2021). "Multi-Stage Feature Fusion Network for Video Super-Resolution". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 30: 2923–2934. doi:10.1109/tip.2021.3056868. ISSN 1057-7149. PMID 33560986. https://dx.doi.org/10.1109%2Ftip.2021.3056868
Isobe, Takashi; Li, Songjiang; Jia, Xu; Yuan, Shanxin; Slabaugh, Gregory; Xu, Chunjing; Li, Ya-Li; Wang, Shengjin et al. (2020). "Video Super-Resolution With Temporal Group Attention". IEEE. doi:10.1109/cvpr42600.2020.00803. ISBN 978-1-7281-7168-5. https://dx.doi.org/10.1109%2Fcvpr42600.2020.00803
Lucas, Alice; Lopez-Tapia, Santiago; Molina, Rafael; Katsaggelos, Aggelos K. (2019). "Generative Adversarial Networks and Perceptual Losses for Video Super-Resolution". IEEE Transactions on Image Processing (Institute of Electrical and Electronics Engineers (IEEE)) 28 (7): 3312–3327. doi:10.1109/tip.2019.2895768. ISSN 1057-7149. PMID 30714918. Bibcode: 2019ITIP...28.3312L. https://dx.doi.org/10.1109%2Ftip.2019.2895768
Yan, Bo; Lin, Chuming; Tan, Weimin (2019-09-28). "Frame and Feature-Context Video Super-Resolution". arXiv:1909.13057v1 [cs.CV]. //arxiv.org/archive/cs.CV
Tian, Zhiqiang; Wang, Yudiao; Du, Shaoyi; Lan, Xuguang (2020-07-10). Yang, You. ed. "A multiresolution mixture generative adversarial network for video super-resolution". PLOS ONE (Public Library of Science (PLoS)) 15 (7): e0235352. doi:10.1371/journal.pone.0235352. ISSN 1932-6203. PMID 32649694. Bibcode: 2020PLoSO..1535352T. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=7351143
Zhu, Xiaobin; Li, Zhuangzi; Lou, Jungang; Shen, Qing (2021). "Video super-resolution based on a spatio-temporal matching network". Pattern Recognition 110: 107619. doi:10.1016/j.patcog.2020.107619. ISSN 0031-3203. Bibcode: 2021PatRe.11007619Z. https://dx.doi.org/10.1016%2Fj.patcog.2020.107619
Li, Wenbo; Tao, Xin; Guo, Taian; Qi, Lu; Lu, Jiangbo; Jia, Jiaya (2020-07-23). "MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution". arXiv:2007.11803v1 [cs.CV]. //arxiv.org/archive/cs.CV
Jo, Younghyun; Oh, Seoung Wug; Kang, Jaeyeon; Kim, Seon Joo (2018). "Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation". IEEE. doi:10.1109/cvpr.2018.00340. ISBN 978-1-5386-6420-9. https://dx.doi.org/10.1109%2Fcvpr.2018.00340
Li, Sheng; He, Fengxiang; Du, Bo; Zhang, Lefei; Xu, Yonghao; Tao, Dacheng (2019-04-05). "Fast Spatio-Temporal Residual Network for Video Super-Resolution". arXiv:1904.02870v1 [cs.CV]. //arxiv.org/archive/cs.CV
Kim, Soo Ye; Lim, Jeongyeon; Na, Taeyoung; Kim, Munchurl (2019). "Video Super-Resolution Based on 3D-CNNS with Consideration of Scene Change". 2019 IEEE International Conference on Image Processing (ICIP). pp. 2831–2835. doi:10.1109/ICIP.2019.8803297. ISBN 978-1-5386-6249-6. https://dx.doi.org/10.1109%2FICIP.2019.8803297
Luo, Jianping; Huang, Shaofei; Yuan, Yuan (2020). "Video Super-Resolution using Multi-scale Pyramid 3D Convolutional Networks". Proceedings of the 28th ACM International Conference on Multimedia. pp. 1882–1890. doi:10.1145/3394171.3413587. ISBN 9781450379885. https://dx.doi.org/10.1145%2F3394171.3413587
Zhang, Dongyang; Shao, Jie; Liang, Zhenwen; Liu, Xueliang; Shen, Heng Tao (2020). "Multi-branch Networks for Video Super-Resolution with Dynamic Reconstruction Strategy". IEEE Transactions on Circuits and Systems for Video Technology: 1. doi:10.1109/TCSVT.2020.3044451. ISSN 1051-8215. https://dx.doi.org/10.1109%2FTCSVT.2020.3044451
Aksan, Emre; Hilliges, Otmar (2019-02-18). "STCN: Stochastic Temporal Convolutional Networks". arXiv:1902.06568v1 [cs.LG]. //arxiv.org/archive/cs.LG
Huang, Yan; Wang, Wei; Wang, Liang (2018). "Video Super-Resolution via Bidirectional Recurrent Convolutional Networks". IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4): 1015–1028. doi:10.1109/TPAMI.2017.2701380. ISSN 0162-8828. PMID 28489532. https://dx.doi.org/10.1109%2FTPAMI.2017.2701380
Zhu, Xiaobin; Li, Zhuangzi; Zhang, Xiao-Yu; Li, Changsheng; Liu, Yaqi; Xue, Ziyu (2019). "Residual Invertible Spatio-Temporal Network for Video Super-Resolution". Proceedings of the AAAI Conference on Artificial Intelligence 33: 5981–5988. doi:10.1609/aaai.v33i01.33015981. ISSN 2374-3468. https://dx.doi.org/10.1609%2Faaai.v33i01.33015981
Li, Dingyi; Liu, Yu; Wang, Zengfu (2019). "Video Super-Resolution Using Non-Simultaneous Fully Recurrent Convolutional Network". IEEE Transactions on Image Processing 28 (3): 1342–1355. doi:10.1109/TIP.2018.2877334. ISSN 1057-7149. PMID 30346282. Bibcode: 2019ITIP...28.1342L. https://dx.doi.org/10.1109%2FTIP.2018.2877334
Isobe, Takashi; Zhu, Fang; Jia, Xu; Wang, Shengjin (2020-08-13). "Revisiting Temporal Modeling for Video Super-resolution". arXiv:2008.05765v2 [eess.IV]. //arxiv.org/archive/eess.IV
Han, Lei; Fan, Cien; Yang, Ye; Zou, Lian (2020). "Bidirectional Temporal-Recurrent Propagation Networks for Video Super-Resolution". Electronics 9 (12): 2085. doi:10.3390/electronics9122085. ISSN 2079-9292. https://dx.doi.org/10.3390%2Felectronics9122085
Fuoli, Dario; Gu, Shuhang; Timofte, Radu (2019-09-17). "Efficient Video Super-Resolution through Recurrent Latent Space Propagation". arXiv:1909.08080 [eess.IV]. //arxiv.org/archive/eess.IV
Isobe, Takashi; Jia, Xu; Gu, Shuhang; Li, Songjiang; Wang, Shengjin; Tian, Qi (2020-08-02). "Video Super-Resolution with Recurrent Structure-Detail Network". arXiv:2008.00455v1 [cs.CV]. //arxiv.org/archive/cs.CV
Yi, Peng; Wang, Zhongyuan; Jiang, Kui; Jiang, Junjun; Ma, Jiayi (2019). "Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 3106–3115. doi:10.1109/ICCV.2019.00320. ISBN 978-1-7281-4803-8. https://dx.doi.org/10.1109%2FICCV.2019.00320
Zhou, Chao; Chen, Can; Ding, Fei; Zhang, Dengyin (2021). "Video super‐resolution with non‐local alignment network". IET Image Processing 15 (8): 1655–1667. doi:10.1049/ipr2.12134. ISSN 1751-9659. https://dx.doi.org/10.1049%2Fipr2.12134
Yi, Peng; Wang, Zhongyuan; Jiang, Kui; Jiang, Junjun; Lu, Tao; Ma, Jiayi (2020). "A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution". IEEE Transactions on Pattern Analysis and Machine Intelligence PP: 1. doi:10.1109/TPAMI.2020.3042298. ISSN 0162-8828. PMID 33270559. https://dx.doi.org/10.1109%2FTPAMI.2020.3042298
"MSU VSR Benchmark Methodology". 2021-04-26. https://videoprocessing.ai/benchmarks/video-super-resolution-methodology.html.

©Text is available under the terms and conditions of the Creative Commons-Attribution ShareAlike (CC BY-SA) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Telecommunications

Contributor MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

HandWiki

View Times: 692

Entry Collection: HandWiki

Update Date: 17 Nov 2022

Table of Contents

Video Upload Options

Confirm