Real World applications (surveillance, forensics, etc.) require HR images, but these are not available in most situations
due to limitations of optical sensors and their cost, giving rise to the
need for Single Image Super-Resolution (SISR).
We tackle this problem by proposing a Triplet loss-based GAN for
Real-world Super-Resolution (SRTGAN) , which exploits
the information from LR image through triplet loss
formulation and improves the adversary and perceptual quality
of generated images.
The below images show true LR and corresponding bicubic downsampled LR image from ground truth HR.
We propose a Triplet loss-based patch GAN, a generator
trained in a multi-loss setting and assisted by a patch-based
We have implemented a Triplet-based adversarial GAN loss, which exploits the information
provided in the LR image (as a negative sample). This allows the
patch-based discriminator to better differentiate between HR
and LR images; hence, improving the adversary.
Training is performed on a fusion of content (pixel-wise L1 loss), GAN
(triplet-based), Quality Assessment (QA), and
perceptual losses, leading to superior quantitative and
subjective quality of SR results.
Our proposed framework consists of 2 major components:
Generator: It is trained in a multi loss setting comprising of 3 different modules:
LLIE: comprises a convolutional layer to extract low-level
edge and structural information.
HLIE: comprises of 32 Residual-In-Residual (RIR) blocks and
a convolutional layer to extract high-level information.
SRRec: omprises of an upsampling block and 2 conv layers
to reconstruct spatial dimensions same as input.
Discriminator: A PatchGAN based discriminator network to distinguish foreground and background on patch with scale of 70x70 pixels.
Generally, for comparison of SR results obtained using the proposed method with other state-of-the-art methods, PSNR and SSIM values are estimated, which are the standard measurements for the SR problem.
However, these metrics do not entirely justify the quality based on human perception. Therefore, we estimate an additional metric, called LPIPS which is a deep network based full-reference perceptual quality assessment score. A low LPIPS value indicates better visual quality.
If you have any questions, please reach out to any of the above mentioned authors.